Abstract:
A technique quantifies logical storage space trapped in an extent store due to overlapping write requests associated with volume metadata managed by the volume layer. The volume metadata is illustratively organized as a multi-level dense tree metadata structure, wherein each level of the dense tree metadata structure (dense tree) includes volume metadata entries for storing the volume metadata. When a level of the dense tree is full, the volume metadata entries of the level are merged with a next lower level of the dense tree in accordance with a merge operation. Illustratively, the technique may be invoked during the merge operation to examine the volume metadata entries at each level of the dense tree involved in the merge and determine the LBA range overlap of the entries. To that end, the technique may include an algorithm configured to calculate the overlapping space per level and then aggregate the overlapping space of all levels involved in the merge operation to arrive at a result that quantifies the logical storage space trapped in the extent store.
Abstract:
A layout of a transaction log enables efficient logging of metadata into entries of the log, as well as efficient reclamation and recovery of the log entries by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The transaction log is illustratively a two stage, append-only logging structure, wherein the first level is non-volatile random access memory (NVRAM) embodied as a NV log and the second stage is disk, e.g., solid state drive (SSD). The layout of the logging structure facilitates steady-state logging of metadata managed by the volume layer and crash recovery. Steady-state logging of metadata into the log entries occurs while the storage I/O stack of a node actively processes I/O requests, while crash recovery of the log entries occurs after an unexpected shutdown of the node.
Abstract:
A technique quantifies logical storage space trapped in an extent store due to overlapping write requests associated with volume metadata managed by the volume layer. The volume metadata is illustratively organized as a multi-level dense tree metadata structure, wherein each level of the dense tree metadata structure (dense tree) includes volume metadata entries for storing the volume metadata. When a level of the dense tree is full, the volume metadata entries of the level are merged with a next lower level of the dense tree in accordance with a merge operation. Illustratively, the technique may be invoked during the merge operation to examine the volume metadata entries at each level of the dense tree involved in the merge and determine the LBA range overlap of the entries. To that end, the technique may include an algorithm configured to calculate the overlapping space per level and then aggregate the overlapping space of all levels involved in the merge operation to arrive at a result that quantifies the logical storage space trapped in the extent store.
Abstract:
A layout of a transaction log enables efficient logging of metadata into entries of the log, as well as efficient reclamation and recovery of the log entries by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The transaction log is illustratively a two stage, append-only logging structure, wherein the first level is non-volatile random access memory (NVRAM) embodied as a NVlog and the second stage is disk, e.g., solid state drive (SSD). During crash recovery, the log entries are examined for consistency and scanned to identify those entries that have completed and those that are active, which require replay. The log entries are walked from oldest to newest (using sequence numbers) searching for the highest sequence number. Partially complete log entries (e.g., log entries in-progress when a crash occurs) may be discarded for failing a checksum (e.g., a CRC error). Old value/new value logs may be used to implement roll-forward or roll-back semantics to replay the log entries and fix any on-disk data structures, first from NVRAM and then from on-disk logs.
Abstract:
A technique quantifies logical storage space trapped in an extent store due to overlapping write requests associated with volume metadata managed by the volume layer. The volume metadata is illustratively organized as a multi-level dense tree metadata structure, wherein each level of the dense tree metadata structure (dense tree) includes volume metadata entries for storing the volume metadata. When a level of the dense tree is full, the volume metadata entries of the level are merged with a next lower level of the dense tree in accordance with a merge operation. Illustratively, the technique may be invoked during the merge operation to examine the volume metadata entries at each level of the dense tree involved in the merge and determine the LBA range overlap of the entries. To that end, the technique may include an algorithm configured to calculate the overlapping space per level and then aggregate the overlapping space of all levels involved in the merge operation to arrive at a result that quantifies the logical storage space trapped in the extent store.
Abstract:
A technique recovers from a low space condition associated with storage space reserved in an extent store to accommodate write requests received from a host and associated metadata managed by a layered file system of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The write requests, including user data, are persistently recorded on non-volatile random access memory (NVRAM) prior to returning an acknowledgement to the host by a persistence layer of the storage I/O stack. Volume metadata managed by a volume layer of the layered file system is embodied as mappings from logical block addresses (LBAs) of a logical unit (LUN) accessible by the host to extent keys maintained by an extent store layer of the layered file system. Extent store metadata managed by the extent store layer is embodied as mappings from the extent keys to the storage locations of the extents on storage devices of storage arrays coupled to the nodes of the cluster. The space recovery technique accounts for storage space consumed in the extent store by user operations, i.e., write operations for the user data stored on the NVRAM at the persistence layer as well as the associated volume and extent store metadata, to ensure that the user data and associated metadata can be safely and reliably persisted in the extent store even during a low space condition.
Abstract:
A technique quantifies logical storage space trapped in an extent store due to overlapping write requests associated with volume metadata managed by the volume layer. The volume metadata is illustratively organized as a multi-level dense tree metadata structure, wherein each level of the dense tree metadata structure (dense tree) includes volume metadata entries for storing the volume metadata. When a level of the dense tree is full, the volume metadata entries of the level are merged with a next lower level of the dense tree in accordance with a merge operation. Illustratively, the technique may be invoked during the merge operation to examine the volume metadata entries at each level of the dense tree involved in the merge and determine the LBA range overlap of the entries. To that end, the technique may include an algorithm configured to calculate the overlapping space per level and then aggregate the overlapping space of all levels involved in the merge operation to arrive at a result that quantifies the logical storage space trapped in the extent store.
Abstract:
A layout of a transaction log enables efficient logging of metadata into entries of the log, as well as efficient reclamation and recovery of the log entries by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The transaction log is illustratively a two stage, append-only logging structure, wherein the first level is non-volatile random access memory (NVRAM) embodied as a NVlog and the second stage is disk, e.g., solid state drive (SSD). During crash recovery, the log entries are examined for consistency and scanned to identify those entries that have completed and those that are active, which require replay. The log entries are walked from oldest to newest (using sequence numbers) searching for the highest sequence number. Partially complete log entries (e.g., log entries in-progress when a crash occurs) may be discarded for failing a checksum (e.g., a CRC error). Old value/new value logs may be used to implement roll-forward or roll-back semantics to replay the log entries and fix any on-disk data structures, first from NVRAM and then from on-disk logs.