Abstract:
In one embodiment, a technique is provided for distributing data and associated metadata within a distributed storage architecture. A set of hash tables that embody mappings of cluster-wide identifiers associated with storage locations are stored for write data of write requests organized into extents. A hash value is generated from a hash function applied to each extent. The hash value is overloaded and used for multiple purposes within the distributed storage architecture, including (i) a remainder computation on the hash value to select a bucket of a plurality of buckets representative of the extents, (ii) a hash table selector of the hash value to select a hash table from the set of hash tables, and (iii) a hash table index computed from the hash value to select an entry from a plurality of entries of the selected hash table having a cluster-wide identifier identifying a storage location for the extent.
Abstract:
A storage server receives a write request from a client system including new data and a location to store the new data. The storage server transmits a copy instruction to a storage subsystem to relocate old data at the location and transmits a write instruction to the storage subsystem to overwrite the old data with the new data. The storage subsystem includes fast stable storage in which the copy instruction and the write instruction are stored. After receiving each instruction, the storage subsystem sends an acknowledgement to the storage server. When both instructions have been acknowledged, the storage server sends an acknowledgement to the client system. The storage subsystem performs the instructions asynchronously from the client system's write request.
Abstract:
Described is a technique for managing the content of a nonvolatile solid-state memory data cache to improve cache performance while at the same time, and in a complementary manner, providing for automatic wear leveling. A modified circular first-in first-out (FIFO) log/algorithm is generally used to determine cache content replacement. The algorithm is used as the default mechanism for determining cache content to be replaced when the cache is full but is subject to modification in some instances. In particular, data are categorized according to different data classes prior to being written to the cache, based on usage. Once cached, data belonging to certain classes are treated differently than the circular FIFO replacement algorithm would dictate. Further, data belonging to each class are localized to designated regions within the cache.
Abstract:
A technique is described for improving throughput in a processing system, such as a network storage server. The technique provides multiple levels (e.g., a hierarchy) of parallelism of process execution within a single mutual exclusion domain, in a manner which allows certain operations on metadata to be parallelized as well as certain operations on user data. The specific parallelization scheme used in any given embodiment is based at least partly on the underlying metadata structures used by the processing system. Consequently, a high degree of parallelization possible, which improves the throughput of the processing system.
Abstract:
A storage system, such as a file server, receives a request to perform a write operation that affects a data block. In response, the storage system writes to a storage device the data block together with context information which uniquely identifies the write operation with respect to the data block. When the data block is subsequently read from the storage device together with the context information, the context information that was read with the data block is used to determine whether a previous write of the data block was lost.