Abstract:
Embodiments herein are directed to efficient crash recovery of persistent metadata managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. Volume metadata managed by the volume layer is organized as a multi-level dense tree, wherein each level of the dense tree includes volume metadata entries for storing the volume metadata. When a level of the dense tree is full, the volume metadata entries of the level are merged with the next lower level of the dense tree. During a merge operation, two sets of generation IDs may be used in accordance with a double buffer arrangement: a first generation ID for the append buffer that is full (i.e., a merge staging buffer) and a second, incremented generation ID for the append buffer that accepts new volume metadata entries. Upon completion of the merge operation, the lower level (e.g., level 1) to which the merge is directed is assigned the generation ID of the merge staging buffer.
Abstract:
Embodiments herein are directed to efficient crash recovery of persistent metadata managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. Volume metadata managed by the volume layer is organized as a multi-level dense tree, wherein each level of the dense tree includes volume metadata entries for storing the volume metadata. When a level of the dense tree is full, the volume metadata entries of the level are merged with the next lower level of the dense tree. During a merge operation, two sets of generation IDs may be used in accordance with a double buffer arrangement: a first generation ID for the append buffer that is full (i.e., a merge staging buffer) and a second, incremented generation ID for the append buffer that accepts new volume metadata entries. Upon completion of the merge operation, the lower level (e.g., level 1) to which the merge is directed is assigned the generation ID of the merge staging buffer.
Abstract:
An offset range striping technique increases concurrency of operation execution directed to metadata managed by a volume layer of a storage input/output (I/O) stack, while reducing contention among resources of one or more nodes of a cluster. A logical unit (LUN) may be apportioned into multiple volumes, each of which may be partitioned into multiple regions, wherein each region is represented by a dense tree. The technique increases concurrency of operation execution (e.g., modifications to the metadata at the offset ranges), while reducing contention among the resources (e.g., CPUs and NVLogs) by distributing the offset range operations among the regions and mapping the regions to services and NVLogs. Such increased concurrency and reduction of contention may be achieved by implementation of the technique to (i) apportion each region into disjoint chunks (i.e., stripes) of contiguous offset ranges; (ii) organize a plurality of regions into one or more zones and populate a first zone before allocating a second zone; and (iii) stagger the mapping of services to starting regions of the volumes.
Abstract:
Embodiments herein are directed to efficient crash recovery of persistent metadata managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. Volume metadata managed by the volume layer is organized as a multi-level dense tree, wherein each level of the dense tree includes volume metadata entries for storing the volume metadata. When a level of the dense tree is full, the volume metadata entries of the level are merged with the next lower level of the dense tree. During a merge operation, two sets of generation IDs may be used in accordance with a double buffer arrangement: a first generation ID for the append buffer that is full (i.e., a merge staging buffer) and a second, incremented generation ID for the append buffer that accepts new volume metadata entries. Upon completion of the merge operation, the lower level (e.g., level 1) to which the merge is directed is assigned the generation ID of the merge staging buffer.