Abstract:
Described is a technique for managing the content of a nonvolatile solid-state memory data cache to improve cache performance while at the same time, and in a complementary manner, providing for automatic wear leveling. A modified circular first-in first-out (FIFO) log/algorithm is generally used to determine cache content replacement. The algorithm is used as the default mechanism for determining cache content to be replaced when the cache is full but is subject to modification in some instances. In particular, data are categorized according to different data classes prior to being written to the cache, based on usage. Once cached, data belonging to certain classes are treated differently than the circular FIFO replacement algorithm would dictate. Further, data belonging to each class are localized to designated regions within the cache.
Abstract:
A data access request to a file system is decomposed into a plurality of lower-level I/O tasks. A logical combination of physical storage components is represented as a hierarchical set of objects. A parent I/O task is generated from a first object in response to the data access request. A child I/O task is generated from a second object to implement a portion of the parent I/O task. The parent I/O task is suspended until the child I/O task completes. The child I/O task is executed in response to an occurrence of an event that a resource required by the child I/O task is available. The parent I/O task is resumed upon an event indicating completion of the child I/O task. Scheduling of any child I/O task is not conditional on execution of the parent I/O task, and a state diagram regulates the child I/O tasks.
Abstract:
A storage system, such as a file server, receives a request to perform a write operation that affects a data block. In response, the storage system writes to a storage device the data block together with context information which uniquely identifies the write operation with respect to the data block. When the data block is subsequently read from the storage device together with the context information, the context information that was read with the data block is used to determine whether a previous write of the data block was lost.
Abstract:
A flash-optimized, log-structured layer of a file system of a storage input/output (I/O) stack executes on one or more nodes of a cluster. The log-structured layer of the file system provides sequential storage of data and metadata (i.e., a log-structured layout) on solid state drives (SSDs) of storage arrays in the cluster to reduce write amplification, while leveraging variable compression and variable length data features of the storage I/O stack. The data may be organized as an arbitrary number of variable-length extents of one or more host-visible logical units (LUNs) served by the nodes. The metadata may include mappings from host-visible logical block address ranges (i.e., offset ranges) of a LUN to extent keys, as well as mappings of the extent keys to SSD storage locations of the extents. The storage location of an extent on SSD is effectively “virtualized” by its mapped extent key (i.e., extent store layer mappings) such that relocation of the extent on SSD does require update to volume layer metadata (i.e., the extent key sufficiently identifies the extent).
Abstract:
A flash-optimized, log-structured layer of a file system of a storage input/output (I/O) stack executes on one or more nodes of a cluster. The log-structured layer of the file system provides sequential storage of data and metadata (i.e., a log-structured layout) on solid state drives (SSDs) of storage arrays in the cluster to reduce write amplification, while leveraging variable compression and variable length data features of the storage I/O stack. The data may be organized as an arbitrary number of variable-length extents of one or more host-visible logical units (LUNs) served by the nodes. The metadata may include mappings from host-visible logical block address ranges (i.e., offset ranges) of a LUN to extent keys, as well as mappings of the extent keys to SSD storage locations of the extents. The storage location of an extent on SSD is effectively “virtualized” by its mapped extent key (i.e., extent store layer mappings) such that relocation of the extent on SSD does require update to volume layer metadata (i.e., the extent key sufficiently identifies the extent).
Abstract:
In one embodiment, one or more storage arrays of solid state drives (SSDs) that include a plurality of segments are organized as one or more redundant array of independent disks (RAID) groups, where the RAID groups provides data redundancy for the segments. A node executing a layered file system of a storage input/output (I/O) stack performs segment cleaning to clean the segments. It further initiates rebuild of a RAID configuration of the SSDs on a segment-by-segment basis in response to the segment cleaning. In such a configuration, each segment includes one or more RAID stripes that provide a level of data redundancy as well as RAID organization for the segment.
Abstract:
In one embodiment, a file system driven RAID rebuild technique is provided. A layered file system may organize storage of data as segments spanning one or more sets of storage devices, such as solid state drives (SSDs), of a storage array, wherein each set of SSDs may form a RAID group configured to provide data redundancy for a segment. The file system may then drive (i.e., initiate) rebuild of a RAID configuration of the SSDs on a segment-by-segment basis in response to cleaning of the segment (i.e., segment cleaning). Each segment may include one or more RAID stripes that provide a level of data redundancy (e.g., single parity RAID 5 or double parity RAID 6) as well as RAID organization (i.e., distribution of data and parity) for the segment. Notably, the level of data redundancy and RAID organization may differ among the segments of the array.
Abstract:
In one embodiment, a file system driven RAID rebuild technique is provided. A layered file system may organize storage of data as segments spanning one or more sets of storage devices, such as solid state drives (SSDs), of a storage array, wherein each set of SSDs may form a RAID group configured to provide data redundancy for a segment. The file system may then drive (i.e., initiate) rebuild of a RAID configuration of the SSDs on a segment-by-segment basis in response to cleaning of the segment (i.e., segment cleaning). Each segment may include one or more RAID stripes that provide a level of data redundancy (e.g., single parity RAID 5 or double parity RAID 6) as well as RAID organization (i.e., distribution of data and parity) for the segment. Notably, the level of data redundancy and RAID organization may differ among the segments of the array.
Abstract:
In one embodiment, one or more storage arrays of solid state drives (SSDs) that include a plurality of segments are organized as one or more redundant array of independent disks (RAID) groups, where the RAID groups provides data redundancy for the segments. A node executing a layered file system of a storage input/output (I/O) stack performs segment cleaning to clean the segments. It further initiates rebuild of a RAID configuration of the SSDs on a segment-by-segment basis in response to the segment cleaning. In such a configuration, each segment includes one or more RAID stripes that provide a level of data redundancy as well as RAID organization for the segment.
Abstract:
In one embodiment, a node is a member of a cluster having a plurality of nodes, where each node is coupled to one or more storage arrays of solid state drives (SSDs) that serve as main storage. The node executed a storage input/output (I/O) stack having a redundant array of independent disks (RAID) layer that organizes the SSDs within the one or more storage arrays as one or more RAID groups. Configuration information is stored as a cluster database. The configuration information identifies (i) one or more RAID groups associated with an extent store, (ii) SSDs within each RAID group, and (iii) an identification of a node that owns the extent store. The cluster database is stored separate and apart from the main storage.