Abstract:
A technique enables recovery of storage space trapped in an extent store from overlapping write requests associated with metadata describing volume logical storage addresses for data in the extent store. The metadata is organized as metadata entries in a multi-level dense tree metadata structure. When a level of the dense tree is full, the metadata entries of the level are merged with a next lower level of the dense tree in accordance with a dense tree merge operation. The technique may be invoked during the merge operation to process the metadata entries associated with the overlapping write requests involved in the merge operation. Processing of the overlapping write requests during the merge operation may partially overwrite extents which, in turn, may result in logical storage space being trapped in the extent store. The technique may perform read-modify-write (RMW) operations on the partially overwritten extents to recapture that trapped space.
Abstract:
A system can maintain multiple queues for deduplication requests of different priorities. The system can also designate priority of storage units. The scheduling priority of a deduplication request is based on the priority of the storage unit indicated in the deduplication request and a trigger for the deduplication request.
Abstract:
A deferred refcount update technique efficiently frees storage space for metadata (associated with data) to be deleted during a merge operation managed by a volume layer of a node. The metadata is illustratively volume metadata embodied as mappings from logical block addresses (LBAs) of a logical unit (LUN) to extent keys maintained by an extent store layer of the node. One or more requests to delete (or overwrite) an LBA range within a LUN may be captured as page keys associated with metadata pages during the merge operation and the storage space associated with those metadata pages may be freed in an out-of-band fashion. The page keys of the metadata pages may be persistently recorded in a reference count (refcount) log to thereby allow the merge operation to complete without resolving deletion of the keys. A batch of page keys may be organized as one or more delete requests and, once the merge completes, the keys may be inserted into the refcount log. Subsequently, a deferred reference count update process may be spawned (instantiated) to walk through the page keys stored in the refcount log and delete each key, e.g., from the extent store layer, independently and out-of-band from the merge operation.
Abstract:
Systems for deduplicating one or more storage units of a storage system provide a scheduler, which is operable to select at least one storage unit (e.g. a storage volume) for deduplication and perform a deduplication process, which removes duplicate data blocks from the selected storage volume. The systems are operable to determine the state of one or more storage units and manage deduplication requests in part based state information. The system is further operable to manage user generated requests and manage deduplication requests in part based on user input information. The system may include a rules engine which prioritizes system operations including determining an order in which to perform state-gathering information and determining an order in which to perform deduplication. The system is further operable to determine the order in which storage units are processed.
Abstract:
In one embodiment, a node coupled to one or more storage devices executes a storage input/output (I/O) stack having a volume layer, a persistence layer and an administration layer that interact to create a copy of a parent volume associated with a storage container on the one or more storage devices. A copy create start message is received at the persistence layer from the administration layer. The persistence layer ensures that dirty data for the parent volume is incorporated into the copy of the parent volume. New data for the parent volume received at the persistence layer during creation of the copy of the parent volume is prevented from incorporation into the copy of the parent volume. A reply to the copy create start message is sent from the persistence layer to the administration layer to initiate the creation of the copy of the parent volume at the volume layer.
Abstract:
In one embodiment, a node coupled to one or more storage devices executes a storage input/output (I/O) stack having a volume layer that manages volume metadata. The volume metadata is organized as one or more dense tree metadata structures having a top level residing in memory and lower levels residing on the one or more storage devices. The dense tree metadata structures include a first dense tree metadata structure associated with a parent volume and a second dense tree metadata structure associated with a copy of the parent volume. The top level of the first dense tree metadata structure may be copied to the second dense tree metadata structure. The lower levels of the first dense tree metadata structure are initially shared with the second dense tree metadata structure. The shared lower levels may eventually be split as the parent volume diverges from the copy of the parent volume.
Abstract:
In one embodiment, snapshots and/or clones of storage objects are created and managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. Illustratively, the snapshots and clones may be represented as independent volumes, and embodied as respective read-only copies (snapshots) and read-write copies (clones) of a parent volume. Volume metadata is illustratively organized as one or more multi-level dense tree metadata structures, wherein each level of the dense tree metadata structure (dense tree) includes volume metadata entries for storing the metadata. Each snapshot/clone may be derived from a dense tree of the parent volume (parent dense tree). Portions of the parent dense tree may be shared with the snapshot/clone.
Abstract:
A method performed in a system that has a plurality of volumes stored to storage hardware, the method including generating, for each of the volumes, a respective space saving potential iteratively over time and scheduling space saving operations among the plurality of volumes by analyzing each of the volumes for space saving potential and assigning priority of resources based at least in part on space saving potential.
Abstract:
The embodiments described herein are directed to an organization of metadata managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The metadata managed by the volume layer, i.e., the volume metadata, is illustratively embodied as mappings from addresses, i.e., logical block addresses (LBAs), of a logical unit (LUN) accessible by a host to durable extent keys maintained by an extent store layer of the storage I/O stack. In an embodiment, the volume layer organizes the volume metadata as a mapping data structure, i.e., a dense tree metadata structure, which represents successive points in time to enable efficient access to the metadata.
Abstract:
Systems for deduplicating one or more storage units of a storage system provide a scheduler, which is operable to select at least one storage unit (e.g. a storage volume) for deduplication and perform a deduplication process, which removes duplicate data blocks from the selected storage volume. The systems are operable to determine the state of one or more storage units and manage deduplication requests in part based state information. The system is further operable to manage user generated requests and manage deduplication requests in part based on user input information. The system may include a rules engine which prioritizes system operations including determining an order in which to perform state-gathering information and determining an order in which to perform deduplication. The system is further operable to determine the order in which storage units are processed.