Abstract:
The embodiments described herein are directed to efficient merging of metadata managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The metadata managed by the volume layer, i.e., the volume metadata, is illustratively organized as a multi-level dense tree metadata structure, wherein each level of the dense tree metadata structure (dense tree) includes volume metadata entries for storing the volume metadata. The volume metadata entries of an upper level of the dense tree metadata structure are merged with the volume metadata entries of a next lower level of the dense tree metadata structure when the upper level is full. The volume metadata entries of the merged levels are organized as metadata pages and stored as one or more files on the SSDs.
Abstract:
In one embodiment, a flash-optimized, log-structured layer of a file system of a storage input/output (I/O) stack executes on one or more nodes of a cluster. The log-structured layer of the file system provides sequential storage of data and metadata on solid state drives (SSDs) to reduce write amplification, while leveraging variable compression and variable length data features of the storage I/O stack. The data may be organized as an arbitrary number of variable-length extents of one or more host-visible logical units (LUNs). The metadata may include mappings from host-visible logical block address ranges of a LUN to extent keys, as well as mappings of the extent keys to SSD storage locations of the extents. The storage location of an extent on SSD is effectively “virtualized” by its mapped extent key such that relocation of the extent on SSD does not require update to volume layer metadata.
Abstract:
A file system layout apportions an underlying physical volume into one or more virtual volumes (vvols) of a storage system. The underlying physical volume is an aggregate comprising one or more groups of disks, such as RAID groups, of the storage system. The aggregate has its own physical volume block number (pvbn) space and maintains metadata, such as block allocation structures, within that pvbn space. Each vvol has its own virtual volume block number (vvbn) space and maintains metadata, such as block allocation structures, within that vvbn space. Notably, the block allocation structures of a vvol are sized to the vvol, and not to the underlying aggregate, to thereby allow operations that manage data served by the storage system (e.g., snapshot operations) to efficiently work over the vvols. The file system layout extends the file system layout of a conventional write anywhere file layout system implementation, yet maintains performance properties of the conventional implementation.
Abstract:
A first plurality of block identifiers is sorted based, at least in part, on a measure of spatial locality. A second plurality of block identifiers is sorted based, at least in part, on the measure of spatial locality. At least the first plurality of block identifiers and the second plurality of block identifiers are incrementally merged into a third plurality of block identifiers based, at least in part, on the measure of spatial locality. A block of data corresponding to metadata associated with a plurality of block identifiers of the third plurality of block identifiers is updated.
Abstract:
In one embodiment, storage arrays of solid state drives (SSDs) coupled to a node are organized as redundant array of independent disks (RAID) groups. Each storage array includes one or more segments. Each segment has contiguous free space on the SSDs. Data and metadata is organized on the SSDs with a sequential log-structured layout, with the data organized as variable-length extents of one or more logical units (LUNs). Segment cleaning is performed to clean a selected segment by moving the extents of the selected segment that contain valid data to one or more different segments so as to free the selected segment. Additional extents are written as a sequence of contiguous range write operations to the entire free segment with temporal locality to reduce data relocation within the SSDs as a result of the write operations.
Abstract:
A three-way merge technique efficiently updates metadata in accordance with a three-way merge operation managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The metadata is embodied as mappings from logical block addresses (LBAs) of a logical unit (LUN) accessible by a host to durable extent keys, and is organized as a multi-level dense tree. The mappings are organized such that a higher level of the dense tree contains more recent mappings than a next lower level, i.e., the level immediately below. The three-way merge operation is an efficient (i.e., optimized) way of updating the volume metadata mappings of the dense tree by merging the mapping content of all three levels in a single iteration, as opposed to merging the content of the first level with the content of the second level in a first iteration of a two-way merge operation and then merging the results of the first iteration with the content of the third level in a second iteration of the operation.
Abstract:
The embodiments described herein are directed to an organization of metadata managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The metadata managed by the volume layer, i.e., the volume metadata, is illustratively embodied as mappings from addresses, i.e., logical block addresses (LBAs), of a logical unit (LUN) accessible by a host to durable extent keys maintained by an extent store layer of the storage I/O stack. In an embodiment, the volume layer organizes the volume metadata as a mapping data structure, i.e., a dense tree metadata structure, which represents successive points in time to enable efficient access to the metadata.
Abstract:
In one embodiment, a layered file system of a storage input/output (I/O) stack executes on one or more nodes of a cluster. The layered file system includes a flash-optimized, log-structured layer configured to provide sequential storage of data and metadata (i.e., a log-structured layout) on solid state drives (SSDs) of storage arrays in the cluster to reduce write amplification, while leveraging a data de-duplication feature of the storage I/O stack. An extent store layer of the file system performs and maintains mappings of the extent keys to SSD storage locations, while a volume layer of the file system performs and maintains mappings of the LUN offset ranges to the extent keys. Separation of the mapping functions between the volume and extent store layers enables different volumes with different offset ranges to reference a same extent key (and thus a same extent).
Abstract:
In one embodiment, storage arrays of solid state drives (SSDs) coupled to a node are organized as redundant array of independent disks (RAID) groups. Each storage array includes one or more segments. Each segment has contiguous free space on the SSDs. Data and metadata is organized on the SSDs with a sequential log-structured layout, with the data organized as variable-length extents of one or more logical units (LUNs). Segment cleaning is performed to clean a selected segment by moving the extents of the selected segment that contain valid data to one or more different segments so as to free the selected segment. Additional extents are written as a sequence of contiguous range write operations to the entire free segment with temporal locality to reduce data relocation within the SSDs as a result of the write operations.
Abstract:
In one embodiment, a layered file system includes a volume layer and an extent store layer configured to provide sequential log-structured layout of data and metadata on solid state drives (SSDs) of one or more storage arrays. The data is organized as variable-length extents of one or more logical units (LUNs). The metadata includes volume metadata mappings from offset ranges of a LUN to extent keys and extent metadata mappings of the extent keys to storage locations of the extents on the SSDs. The extent store layer maintaining the extent metadata mappings determines whether an extent is stored on a storage array, and, in response to determination that the extent is stored on the storage array, returns an extent key for the stored extent to the volume layer to enable global inline de-duplication that obviates writing a duplicate copy of the extent on the storage array.