Abstract:
Techniques are provided for implementing a persistent key-value store for caching client data, journaling, and/or crash recovery. The persistent key-value store may be hosted as a primary cache that provides read and write access to key-value record pairs stored within the persistent key-value store. The key-value record pairs are stored within multiple chains in the persistent key-value store. Journaling is provided for the persistent key-value store such that incoming key-value record pairs are stored within active chains, and data within frozen chains is written in a distributed manner across distributed storage of a distributed cluster of nodes. If there is a failure within the distributed cluster of nodes, then the persistent key-value store may be reconstructed and used for crash recovery.
Abstract:
Methods and systems for co-locating journaling and data storage are provided. Separate journal and volume partitions may be maintained within each logical storage unit (e.g., Logical Unit Number (LUN)) of a distributed storage system. Journaling of metadata associated with write requests received from one or more clients may be distributed by identifying a destination logical storage unit to which data associated with a given write request is to be stored and causing the data and metadata to be persisted to disk by journaling the metadata and the data to respective portions of an active log within the journal partition of the destination logical storage unit. By using the same logical storage unit for both journaling of write requests and writing the data associated with such write requests, the bottleneck due to there being only a single device or storage unit handling all metadata for all write requests can be avoided.
Abstract:
Techniques are provided for implementing a journal using a block storage device for a plurality of clients. A journal may be hosted as a primary cache for a node, where I/O operations of a plurality of clients are logged within the journal. The node may be part of a distributed cluster of nodes hosted within a container orchestration platform. The journal may be stored in a storage device comprising a block storage device and a cache. Adaptive caching may be implemented to store some journal data of the journal in the cache. For example, a first set of journal data may be stored in the block storage device without storing the first set of journal data in the cache. A second set of journal data may be stored in the block storage device and the cache.
Abstract:
A system and method for handling multi-node failures in a disaster recovery cluster is provided. In the event of an error condition, a switchover operation occurs from the failed nodes to one or more surviving nodes. Data stored in non-volatile random access memory is recovered by the surviving nodes to bring storage objects, e.g., disks, aggregates and/or volumes into a consistent state.
Abstract:
Failover methods and systems for a networked storage environment are provided. In one aspect, a read request associated with a first storage object is received, during a replay of entries of a log stored in a non-volatile memory of a second storage node for a failover operation initiated in response to a failure at a first storage node. The second storage node operates as a partner node of the first storage node. The read request is processed using a filtering data structure that is generated from the log prior to the replay and identifies each log entry. The read request is processed when the log does not have an entry associated with the read request, and when the filtering data structure includes an entry associated with the read request, the requested data is located at the non-volatile memory.
Abstract:
Techniques are provided for a recovery process with selective ordering and concurrent operations in order to recover from a failure. Representations of active log structures are rebuilt within memory according to ordering values assigned to I/O operations logged within the active log structures. Representation of certain active log structures may be concurrently rebuilt based upon the active log structures comprising I/O operations that are non-overlapping within a distributed file system, have no dependencies, relate to different services, and/or target independent files. Representation of stale log structures are concurrently rebuilt within memory. While rebuilding the log structures and executing the I/O operations, a key value map is concurrently rebuilt within the memory for locating data of the I/O operations. Concurrent operations during the recovery process reduces the time to complete the recovery process, and thus reduces client downtime during the recovery process.
Abstract:
A method and system for co-locating journaling and data storage based on write requests. A write request that includes metadata and data is received from a client. A logical storage unit for storing the metadata and the data is identified. The logical storage unit is divided into a journal partition and a volume partition. The journal partition includes a first log and a second log. Which of the first log and the second log is an active log and which of the first log and the second log is an inactive log are identified. The metadata is recorded in a first location in the active log and the data is recorded in a second location in the active log during a single I/O operation. A reply is sent to the client after the metadata and the data are recorded in the journal partition.
Abstract:
A method and system for co-locating journaling and data storage based on write requests. A write request that includes metadata and data is received from a client. A logical storage unit for storing the metadata and the data is identified. The logical storage unit is divided into a journal partition and a volume partition. The journal partition includes a first log and a second log. Which of the first log and the second log is an active log and which of the first log and the second log is an inactive log are identified. The metadata is recorded in a first location in the active log and the data is recorded in a second location in the active log during a single I/O operation. A reply is sent to the client after the metadata and the data are recorded in the journal partition.
Abstract:
Failover methods and systems for a networked storage environment are provided. A filtering data structure and a metadata data structure are generated before starting a replay of a log stored in a non-volatile memory of a second storage node, during a failover operation initiated in response to a failure at a first storage node. The second storage node operates as a partner node of the first storage node to mirror at the log one or more write requests received by the first storage node prior to the failure, and data associated with the one or more write requests. The filtering data structure identifies each log entry and the metadata structure stores a metadata attribute of each log entry. The filtering data structure and the metadata structure are used for providing access to a logical storage object during the log replay from the second storage node.
Abstract:
Methods, systems, and computer program products for providing deferred replication of recovery information at site switchover are disclosed. A computer-implemented method may include receiving a first copy of logged data for storage volumes of a disaster recovery (DR) partner at a remote site from the DR partner, receiving a request to perform a site switchover from the remote site to the local site, receiving a second copy of logged data for the storage volumes from a local high availability (HA) partner in response to the switchover, and recovering the storage volumes locally by applying one or more of the copies of logged data to corresponding mirrored storage volumes at the local site.