Abstract:
Among other things, one or more techniques and/or systems are provided for storing data within a hybrid storage aggregate comprising a lower-latency storage tier and a higher-latency storage tier. In particular, frequently accessed data, randomly accessed data, and/or short lived data may be stored (e.g., read caching and/or write caching) within the lower-latency storage tier. Infrequently accessed data and/or sequentially accessed data may be stored within the higher-latency storage tier. Because the hybrid storage aggregate may comprise a single logical container derived from the higher-latency storage tier and the lower-latency storage tier, additional storage and/or file system functionality may be implemented across the storage tiers. For example, deduplication functionality, caching functionality, backup/restore functionality, and/or other functionality may be provided through a single file system (or other type of arrangement) and/or a cache map implemented within the hybrid storage aggregate.
Abstract:
In one embodiment, an extent store layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster manages efficient logging and checkpointing of metadata. The metadata managed by the extent store layer, i.e., the extent store metadata, resides in a memory (in-core) of each node and is illustratively organized as a key-value extent store embodied as one or more data structures, e.g., a set of hash tables. Changes to the set of hash tables are recorded as a continuous stream of changes to SSD embodied as an extent store layer log. A separate log stream structure (e.g., an in-core buffer) may be associated respectively with each hash table such that changed (i.e., dirtied) slots of the hash table are recorded as entries in the log stream structure. The hash tables are written to SSD using a fuzzy checkpointing technique.
Abstract:
A storage server receives a write request from a client system including new data and a location to store the new data. The storage server transmits a copy instruction to a storage subsystem to relocate old data at the location and transmits a write instruction to the storage subsystem to overwrite the old data with the new data. The storage subsystem includes fast stable storage in which the copy instruction and the write instruction are stored. After receiving each instruction, the storage subsystem sends an acknowledgement to the storage server. When both instructions have been acknowledged, the storage server sends an acknowledgement to the client system. The storage subsystem performs the instructions asynchronously from the client system's write request.
Abstract:
In one embodiment, a layered file system of a storage input/output (I/O) stack executes on one or more nodes of a cluster. The layered file system includes a flash-optimized, log-structured layer configured to provide sequential storage of data and metadata (i.e., a log-structured layout) on solid state drives (SSDs) of storage arrays in the cluster to reduce write amplification, while leveraging a data de-duplication feature of the storage I/O stack. An extent store layer of the file system performs and maintains mappings of the extent keys to SSD storage locations, while a volume layer of the file system performs and maintains mappings of the LUN offset ranges to the extent keys. Separation of the mapping functions between the volume and extent store layers enables different volumes with different offset ranges to reference a same extent key (and thus a same extent).
Abstract:
In one embodiment, storage arrays of solid state drives (SSDs) coupled to a node are organized as redundant array of independent disks (RAID) groups. Each storage array includes one or more segments. Each segment has contiguous free space on the SSDs. Data and metadata is organized on the SSDs with a sequential log-structured layout, with the data organized as variable-length extents of one or more logical units (LUNs). Segment cleaning is performed to clean a selected segment by moving the extents of the selected segment that contain valid data to one or more different segments so as to free the selected segment. Additional extents are written as a sequence of contiguous range write operations to the entire free segment with temporal locality to reduce data relocation within the SSDs as a result of the write operations.
Abstract:
In one embodiment, a layered file system includes a volume layer and an extent store layer configured to provide sequential log-structured layout of data and metadata on solid state drives (SSDs) of one or more storage arrays. The data is organized as variable-length extents of one or more logical units (LUNs). The metadata includes volume metadata mappings from offset ranges of a LUN to extent keys and extent metadata mappings of the extent keys to storage locations of the extents on the SSDs. The extent store layer maintaining the extent metadata mappings determines whether an extent is stored on a storage array, and, in response to determination that the extent is stored on the storage array, returns an extent key for the stored extent to the volume layer to enable global inline de-duplication that obviates writing a duplicate copy of the extent on the storage array.
Abstract:
In one embodiment, a layered file system of a storage input/output (I/O) stack executes on one or more nodes of a cluster. The layered file system includes a flash-optimized, log-structured layer configured to provide sequential storage of data and metadata (i.e., a log-structured layout) on solid state drives (SSDs) of storage arrays in the cluster to reduce write amplification, while leveraging a data de-duplication feature of the storage I/O stack. An extent store layer of the file system performs and maintains mappings of the extent keys to SSD storage locations, while a volume layer of the file system performs and maintains mappings of the LUN offset ranges to the extent keys. Separation of the mapping functions between the volume and extent store layers enables different volumes with different offset ranges to reference a same extent key (and thus a same extent).
Abstract:
A technique is described for improving throughput in a processing system, such as a network storage server. The technique provides multiple levels (e.g., a hierarchy) of parallelism of process execution within a single mutual exclusion domain, in a manner which allows certain operations on metadata to be parallelized as well as certain operations on user data. The specific parallelization scheme used in any given embodiment is based at least partly on the underlying metadata structures used by the processing system. Consequently, a high degree of parallelization possible, which improves the throughput of the processing system.
Abstract:
Described is a technique for managing the content of a nonvolatile solid-state memory data cache to improve cache performance while at the same time, and in a complementary manner, providing for automatic wear leveling. A modified circular first-in first-out (FIFO) log/algorithm is generally used to determine cache content replacement. The algorithm is used as the default mechanism for determining cache content to be replaced when the cache is full but is subject to modification in some instances. In particular, data are categorized according to different data classes prior to being written to the cache, based on usage. Once cached, data belonging to certain classes are treated differently than the circular FIFO replacement algorithm would dictate. Further, data belonging to each class are localized to designated regions within the cache.
Abstract:
In one embodiment, an extent hashing technique is used to efficiently distribute data and associated metadata substantially evenly among nodes of a cluster. The data may be write data associated with a write request issued by a host and received at a node of the cluster. The write data may be organized into one or more extents. A hash function may be applied to the extent to generate a result which may be truncated or trimmed to generate a hash value. A hash space of the hash value may be divided into a plurality of buckets representative of the write data, i.e., the extents, and the associated metadata, i.e., extent metadata. A number of buckets may be assigned to each extent store instance of the nodes to distribute ownership of the buckets, along with their extents and extent metadata, across all of the extent store instances of the nodes.