Abstract:
In one embodiment, a layered file system of a storage input/output (I/O) stack executes on one or more nodes of a cluster. The layered file system includes a flash-optimized, log-structured layer configured to provide sequential storage of data and metadata (i.e., a log-structured layout) on solid state drives (SSDs) of storage arrays in the cluster to reduce write amplification, while leveraging a data de-duplication feature of the storage I/O stack. An extent store layer of the file system performs and maintains mappings of the extent keys to SSD storage locations, while a volume layer of the file system performs and maintains mappings of the LUN offset ranges to the extent keys. Separation of the mapping functions between the volume and extent store layers enables different volumes with different offset ranges to reference a same extent key (and thus a same extent).
Abstract:
In one embodiment, a node of a cluster executing a storage input/output (I/O) stack having a volume layer, stores a multi-level dense tree metadata structure. Each level of the dense tree metadata structure includes volume metadata entries for storing volume metadata. One or more non-volatile logs (NVLogs) are updated. The one or more NVLogs including a volume layer log configured to record changes to the volume metadata, wherein volume metadata entries inserted into a top-level of the dense tree metadata structure are recorded in the volume layer log. The node writes volume metadata entries from the volume layer log to one or more storage devices to be stored as extents.
Abstract:
In one embodiment, an extent hashing technique is used to efficiently distribute data and associated metadata substantially evenly among nodes of a cluster. The data may be write data associated with a write request issued by a host and received at a node of the cluster. The write data may be organized into one or more extents. A hash function may be applied to the extent to generate a result which may be truncated or trimmed to generate a hash value. A hash space of the hash value may be divided into a plurality of buckets representative of the write data, i.e., the extents, and the associated metadata, i.e., extent metadata. A number of buckets may be assigned to each extent store instance of the nodes to distribute ownership of the buckets, along with their extents and extent metadata, across all of the extent store instances of the nodes.
Abstract:
In one embodiment, a technique is provided for distributing data and associated metadata within a distributed storage architecture. A set of hash tables that embody mappings of cluster-wide identifiers associated with storage locations are stored for write data of write requests organized into extents. A hash value is generated from a hash function applied to each extent. The hash value is overloaded and used for multiple purposes within the distributed storage architecture, including (i) a remainder computation on the hash value to select a bucket of a plurality of buckets representative of the extents, (ii) a hash table selector of the hash value to select a hash table from the set of hash tables, and (iii) a hash table index computed from the hash value to select an entry from a plurality of entries of the selected hash table having a cluster-wide identifier identifying a storage location for the extent.
Abstract:
A file system layout apportions an underlying physical volume into one or more virtual volumes (vvols) of a storage system. The underlying physical volume is an aggregate comprising one or more groups of disks, such as RAID groups, of the storage system. The aggregate has its own physical volume block number (pvbn) space and maintains metadata, such as block allocation structures, within that pvbn space. Each vvol has its own virtual volume block number (vvbn) space and maintains metadata, such as block allocation structures, within that vvbn space. Notably, the block allocation structures of a vvol are sized to the vvol, and not to the underlying aggregate, to thereby allow operations that manage data served by the storage system (e.g., snapshot operations) to efficiently work over the vvols. The file system layout extends the file system layout of a conventional write anywhere file layout system implementation, yet maintains performance properties of the conventional implementation.