Abstract:
In one embodiment, snapshots and/or clones of storage objects are created and managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. Illustratively, the snapshots and clones may be represented as independent volumes, and embodied as respective read-only copies (snapshots) and read-write copies (clones) of a parent volume. Volume metadata is illustratively organized as one or more multi-level dense tree metadata structures, wherein each level of the dense tree metadata structure (dense tree) includes volume metadata entries for storing the metadata. Each snapshot/clone may be derived from a dense tree of the parent volume (parent dense tree). Portions of the parent dense tree may be shared with the snapshot/clone.
Abstract:
The embodiments described herein are directed to an organization of metadata managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The metadata managed by the volume layer, i.e., the volume metadata, is illustratively embodied as mappings from addresses, i.e., logical block addresses (LBAs), of a logical unit (LUN) accessible by a host to durable extent keys maintained by an extent store layer of the storage I/O stack. In an embodiment, the volume layer organizes the volume metadata as a mapping data structure, i.e., a dense tree metadata structure, which represents successive points in time to enable efficient access to the metadata.
Abstract:
In one embodiment, a parallel (e.g., tiered) logging technique is provided to deliver low latency acknowledgements of input/output (I/O) requests, such as write requests, while avoiding loss of data. Write data may be stored (copied) as a log in a portion of a dynamic random access memory and a non-volatile random access memory (NVRAM). The NVRAM may be configured as, e.g., a persistent write-back cache of the node, while parameters of the request may be stored in another portion of the NVRAM configured as the log (NVLog). The write data may be organized into separate variable length blocks or extents and “written back” out-of-order from the write-back cache to storage devices, such as SSDs, e.g., organized into a data container (intended destination of the write request). The write data may be preserved in the NVlog until each extent is safely stored on SSD.