Abstract:
Techniques for opportunistic data storage are described. In one embodiment, for example, an apparatus may comprise a data storage device and a storage management module, and the storage management module may be operative to receive a request to store a set of data in the data storage device, the request indicating that the set of data is to be stored with opportunistic retention, the storage management module to select, based on allocation information, storage locations of the data storage device for opportunistic storage of the set of data and write the set of data to the selected storage locations. Other embodiments are described and claimed.
Abstract:
A cache affinity and processor utilization technique efficiently load balances work in a storage input/output (I/O) stack among a plurality of processors and associated processor cores of a node. The storage I/O stack employs one or more non-blocking messaging kernel (MK) threads that execute non-blocking message handlers (i.e., non-blocking services). The technique load balances work between the processor cores sharing a last level cache (LLC) (i.e., intra-LLC processor load balancing), and load balances work between the processors having separate LLCs (i.e., inter-LLC processor load balancing). The technique may allocate a predetermined number of logical processors for use by an MK scheduler to schedule the non-blocking services within the storage I/O stack, as well as allocate a remaining number of logical processors for use by blocking services, e.g., scheduled by an operating system kernel scheduler.
Abstract:
An exactly once semantics (EOS) system of a storage input/output (I/O) stack implements a technique ensuring that non-idempotent operations occur exactly once in a storage system embodied as a node of a cluster. Illustratively, a first layer of the storage I/O stack may act as a client issuing a non-idempotent operation to second layer of the stack, which may act as a server. According to the technique, the EOS system may wrap (i.e., encapsulate) the non-idempotent operation within a transaction embodied as an EOS transaction data structure having a transaction identifier that uniquely identifies the transaction. The server may complete the transaction and reply with a result to the client, which may acknowledge receipt of the reply. In response to a crash and subsequent recovery of the node, the EOS system may determine whether the transaction had completed prior to the crash. If so, the EOS system ensures that the transaction is not re-played (re-executed). Otherwise, the EOS system allows execution of the transaction such that the transaction occurs exactly once.
Abstract:
In one embodiment, a technique is provided for distributing data and associated metadata within a distributed storage architecture. A set of hash tables that embody mappings of cluster-wide identifiers associated with storage locations are stored for write data of write requests organized into extents. A hash value is generated from a hash function applied to each extent. The hash value is overloaded and used for multiple purposes within the distributed storage architecture, including (i) a remainder computation on the hash value to select a bucket of a plurality of buckets representative of the extents, (ii) a hash table selector of the hash value to select a hash table from the set of hash tables, and (iii) a hash table index computed from the hash value to select an entry from a plurality of entries of the selected hash table having a cluster-wide identifier identifying a storage location for the extent.
Abstract:
In one embodiment, a node coupled to one or more solid state drives (SSDs) executes a storage input/output (I/O) stack having a plurality of layers, including a persistence layer. The node includes a non-volatile random access memory (NVRAM). A portion of the NVRAM is configured as a write-back cache to store write data associated with one or more write requests. The persistence layer is configured to organize the write data into extents that are written back to the one or more SSDs in any order. The write data is preserved in the write-back cache until each extent is safely and successfully stored on the one or more SSDs in an event of a power loss.
Abstract:
In one embodiment, a node coupled to solid state drives (SSDs) of a plurality of storage arrays executes a storage input/output (I/O) stack having a plurality of layers. The node includes a non-volatile random access memory (NVRAM). A first portion of the NVRAM is configured as a write-back cache to store write data associated with a write request and a second portion of the NVRAM is configured as one or more non-volatile logs (NVLogs) to record metadata associated with the write request. The write data is passed from the write-back cache over a first path of the storage I/O stack for storage on a first storage array and the metadata is passed from the one or more NVLogs over a second path of the storage I/O stack for storage on a second storage array, wherein the first path is different from the second path.
Abstract:
In one embodiment, an extent key reconstruction technique is provided for use with a set of hash tables embodying metadata. The metadata includes an extent key associated with a storage location on storage devices for write data of one or more write requests organized into an extent. Each hash table has a plurality of entries, and each entry includes a plurality of slots. A first field of the extent key is recreated implicitly from an entry in a first address space portion of a hash table. A second field of the extent key is stored in the slot. A third field of the extent key is stored in the slot. A fourth field of the extent key is recreated implicitly from the hash table of the set of hash tables.
Abstract:
In one embodiment, a cluster uses an extent store layer and a set of hash tables having a plurality of slots embodying extent metadata that describe write data of one or more write requests organized into one or more extents. One or more non-volatile logs (NVLogs) are maintained in the cluster. The one or more NVLogs include an extent store layer log maintained by the extent store layer. The extent store layer log records changes to the set of hash tables as a plurality of log stream structures, where each log stream structure is associated with a hash table. One or more storage devices of the cluster are organized as a plurality of log streams, where each log stream is associated with a corresponding log stream structure of the extent store layer log.
Abstract:
In one embodiment, a node coupled to one or more solid state drives (SSDs) executes a storage input/output (I/O) stack having a plurality of layers, including a persistence layer. The node includes a non-volatile random access memory (NVRAM). A portion of the NVRAM is configured as a write-back cache to store write data associated with one or more write requests. The persistence layer is configured to organize the write data into extents that are written back to the one or more SSDs in any order. The write data is preserved in the write-back cache until each extent is safely and successfully stored on the one or more SSDs in an event of a power loss.
Abstract:
An extent-based storage architecture is implemented by a storage server receiving a read request for an extent from a client, wherein the extent includes a group of contiguous blocks and the read request includes a file block number. The storage server retrieves an extent identifier from a first sorted data structure, wherein the storage server uses the received file block number to traverse the first sorted data structure to the extent identifier. The storage server retrieves a reference to the extent from a second sorted data structure, wherein the storage server uses the retrieved extent identifier to traverse the second sorted data structure to the reference, and wherein the second sorted data structure is global across a plurality of volumes. The storage server retrieves the extent from a storage device using the reference and returns the extent to the client.