Abstract:
Partially overwriting a compression group without decompressing compressed data can consumption of resources for the decompression. A storage server partially overwrites the compression group when a file block identifier of a client's write request resolves to the compression group. The compression group remains compressed while the partial overwriting is performed.
Abstract:
In one embodiment, a node coupled to one or more storage devices executes a storage input/output (I/O) stack having a volume layer, a persistence layer and an administration layer that interact to create a copy of a parent volume associated with a storage container on the one or more storage devices. A copy create start message is received at the persistence layer from the administration layer. The persistence layer ensures that dirty data for the parent volume is incorporated into the copy of the parent volume. New data for the parent volume received at the persistence layer during creation of the copy of the parent volume is prevented from incorporation into the copy of the parent volume. A reply to the copy create start message is sent from the persistence layer to the administration layer to initiate the creation of the copy of the parent volume at the volume layer.
Abstract:
In one embodiment, a node coupled to one or more storage devices executes a storage input/output (I/O) stack having a volume layer that manages volume metadata. The volume metadata is organized as one or more dense tree metadata structures having a top level residing in memory and lower levels residing on the one or more storage devices. The dense tree metadata structures include a first dense tree metadata structure associated with a parent volume and a second dense tree metadata structure associated with a copy of the parent volume. The top level of the first dense tree metadata structure may be copied to the second dense tree metadata structure. The lower levels of the first dense tree metadata structure are initially shared with the second dense tree metadata structure. The shared lower levels may eventually be split as the parent volume diverges from the copy of the parent volume.
Abstract:
In one embodiment, snapshots and/or clones of storage objects are created and managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. Illustratively, the snapshots and clones may be represented as independent volumes, and embodied as respective read-only copies (snapshots) and read-write copies (clones) of a parent volume. Volume metadata is illustratively organized as one or more multi-level dense tree metadata structures, wherein each level of the dense tree metadata structure (dense tree) includes volume metadata entries for storing the metadata. Each snapshot/clone may be derived from a dense tree of the parent volume (parent dense tree). Portions of the parent dense tree may be shared with the snapshot/clone.
Abstract:
The embodiments described herein are directed to an organization of metadata managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The metadata managed by the volume layer, i.e., the volume metadata, is illustratively embodied as mappings from addresses, i.e., logical block addresses (LBAs), of a logical unit (LUN) accessible by a host to durable extent keys maintained by an extent store layer of the storage I/O stack. In an embodiment, the volume layer organizes the volume metadata as a mapping data structure, i.e., a dense tree metadata structure, which represents successive points in time to enable efficient access to the metadata.
Abstract:
A flash-optimized, log-structured layer of a file system of a storage input/output (I/O) stack executes on one or more nodes of a cluster. The log-structured layer of the file system provides sequential storage of data and metadata (i.e., a log-structured layout) on solid state drives (SSDs) of storage arrays in the cluster to reduce write amplification, while leveraging variable compression and variable length data features of the storage I/O stack. The data may be organized as an arbitrary number of variable-length extents of one or more host-visible logical units (LUNs) served by the nodes. The metadata may include mappings from host-visible logical block address ranges (i.e., offset ranges) of a LUN to extent keys, as well as mappings of the extent keys to SSD storage locations of the extents. The storage location of an extent on SSD is effectively “virtualized” by its mapped extent key (i.e., extent store layer mappings) such that relocation of the extent on SSD does require update to volume layer metadata (i.e., the extent key sufficiently identifies the extent).
Abstract:
A N-way merge technique efficiently updates metadata in accordance with a N-way merge operation managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The metadata is embodied as mappings from logical block addresses (LBAs) of a logical unit (LUN) accessible by a host to durable extent keys, and is organized as a multi-level dense tree. The mappings are organized such that a higher level of the dense tree contains more recent mappings than a next lower level, i.e., the level immediately below. The N-way merge operation is an efficient (i.e., optimized) way of updating the volume metadata mappings of the dense tree by merging the mapping content of all three levels in a single iteration, as opposed to merging the content of the first level with the content of the second level in a first iteration of a two-way merge operation and then merging the results of the first iteration with the content of the third level in a second iteration of the operation.
Abstract:
A flash-optimized, log-structured layer of a file system of a storage input/output (I/O) stack executes on one or more nodes of a cluster. The log-structured layer of the file system provides sequential storage of data and metadata (i.e., a log-structured layout) on solid state drives (SSDs) of storage arrays in the cluster to reduce write amplification, while leveraging variable compression and variable length data features of the storage I/O stack. The data may be organized as an arbitrary number of variable-length extents of one or more host-visible logical units (LUNs) served by the nodes. The metadata may include mappings from host-visible logical block address ranges (i.e., offset ranges) of a LUN to extent keys, as well as mappings of the extent keys to SSD storage locations of the extents. The storage location of an extent on SSD is effectively “virtualized” by its mapped extent key (i.e., extent store layer mappings) such that relocation of the extent on SSD does require update to volume layer metadata (i.e., the extent key sufficiently identifies the extent).
Abstract:
In one embodiment, a node coupled to one or more storage devices executes a storage input/output (I/O) stack having a volume layer, a persistence layer and an administration layer that interact to create a copy of a parent volume associated with a storage container on the one or more storage devices. A copy create start message is received at the persistence layer from the administration layer. The persistence layer ensures that dirty data for the parent volume is incorporated into the copy of the parent volume. New data for the parent volume received at the persistence layer during creation of the copy of the parent volume is prevented from incorporation into the copy of the parent volume. A reply to the copy create start message is sent from the persistence layer to the administration layer to initiate the creation of the copy of the parent volume at the volume layer.
Abstract:
The embodiments described herein are directed to efficient merging of metadata managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The metadata managed by the volume layer, i.e., the volume metadata, is illustratively organized as a multi-level dense tree metadata structure, wherein each level of the dense tree metadata structure (dense tree) includes volume metadata entries for storing the volume metadata. The volume metadata entries of an upper level of the dense tree metadata structure are merged with the volume metadata entries of a next lower level of the dense tree metadata structure when the upper level is full. The volume metadata entries of the merged levels are organized as metadata pages and stored as one or more files on the SSDs.