Abstract:
Techniques are provided for remote object store error handling. A storage system may store data within one or more tiers of storage, such as a local storage tier (e.g., solid state storage and disks maintained by the storage system), a remote object store (e.g., storage provided by a third party storage provider), and/or other storage tiers. Because the remote object store may not provide the same data consistency and guarantees that the storage system provides for clients such as through the local storage tier, additional validation is provided by the storage system for the remote object store. For example, when data is put into an object of the remote object store, a verification get operation is performed to read and validate information within a header of the object. Other verifications and checks are performed such as using a locally stored metafile to detect corrupt or lost metadata and/or objects.
Abstract:
Techniques are provided for multi-tier write allocation. A storage system may store data within a multi-tier storage environment comprising a first storage tier (e.g., storage devices maintained by the storage system), a second storage tier (e.g., a remote object store provided by a third party storage provider), and/or other storage tiers. A determination is made that data (e.g., data of a write request received by the storage system) is to be stored within the second storage tier. The data is stored into a staging area of the first storage tier. A second storage tier location identifier, for referencing the data according to a format utilized by the second storage tier, is assigned to the data and provided to a file system hosting the data. The data is then destaged from the staging area into the second storage tier, such as within an object stored within the remote object store.
Abstract:
A method, non-transitory computer readable medium, and computing device that receives metadata for a block associated with an object from a source storage node. The metadata comprises a source object identifier and the object is associated with a source volume of a source aggregate owned by the source storage node. A determination is made when another block associated with the object has been previously received. A destination object identifier is obtained based on the source object identifier, when the determining indicates that the other block associated with the object has been previously received. A new aggregate block number is assigned to the block based on the destination object identifier and another portion of the metadata. Ownership of the source volume is transferred upon receipt of an indication of a cutover from the source storage node in order to migrate the source volume to a destination volume of a destination aggregate.
Abstract:
Presented herein are mass data storage systems, file system protocols, non-transitory machine readable devices, and methods for storing data blocks in data file systems. Methods for compressing snapshot data in a data file system are disclosed which include: loading a snapshot file with one or more data blocks, the snapshot representing a state of the data file system at a point in time; determining if at least one of the snapshot data blocks is less than a predetermined byte value; responsive to a snapshot data block having a size that is less than the predetermined byte value, identifying a packed block configured to store data chunks from plural distinct snapshots and having available sufficient storage space to store the snapshot data block; and adding to the packed block the snapshot data block and lost-write context information corresponding to the snapshot data block.
Abstract:
It is determined that a first data block contains the same data as a second data block. The first data block is associated with a first extent and the second data block is associated with a second extent. In response to determining that the first data block contains the same data as the second data block, the second data block is associated with the first extent and the first data block is disassociated with the second extent.
Abstract:
Techniques are provided for utilizing a log to free pages from persistent memory. A log is maintained to comprise a list of page block numbers of pages within persistent memory of a node to free. A page block number, of a page, within the log is identified for processing. A reference count, corresponding to a number of references to the page block number, is identified. In response to the reference count being greater than 1, the reference count is decremented and the page block number is removed from the log. In response to the reference count being 1, the page is freed from the persistent memory and the page block number is removed from the log.
Abstract:
Techniques are provided for multi-tier write allocation. A storage system may store data within a multi-tier storage environment comprising a first storage tier (e.g., storage devices maintained by the storage system), a second storage tier (e.g., a remote object store provided by a third party storage provider), and/or other storage tiers. A determination is made that data (e.g., data of a write request received by the storage system) is to be stored within the second storage tier. The data is stored into a staging area of the first storage tier. A second storage tier location identifier, for referencing the data according to a format utilized by the second storage tier, is assigned to the data and provided to a file system hosting the data. The data is then destaged from the staging area into the second storage tier, such as within an object stored within the remote object store.
Abstract:
Systems and methods for scaling application and/or storage system functions of a distributed storage system based on a heterogeneous resource pool are provided. According to one embodiment, the distributed storage system has a composable, service-based architecture that provides scalability, resiliency, and load balancing. The distributed storage system includes a cluster of nodes each potentially having differing capabilities in terms of processing, memory, and/or storage. The distributed storage system takes advantage of different types of nodes by selectively instating appropriate services (e.g., file and volume services and/or block and storage management services) on the nodes based on their respective capabilities. Furthermore, disaggregation of these services, facilitated by interposing a frictionless layer (e.g., in the form of one or more globally accessible logical disks) therebetween, enables independent and on-demand scaling of either or both of application and storage system functions within the cluster while making use of the heterogeneous resource pool.
Abstract:
Techniques are provided for coordinating snapshot operations across multiple file systems. A notification may be received that a snapshot of data stored across a persistent memory file system and a storage file system is to be generated. Forwarding, of modify operations from a persistent memory tier to a file system tier for execution through the storage file system, may be enabled. Framing may be initiated to notify the storage file system of blocks within the persistent memory file system that comprise more up-to-date data than corresponding blocks within the storage file system. In response to the framing completing, a consistency point operation is performed to create the snapshot and to create a snapshot image as part of the snapshot.
Abstract:
Techniques are provided for persistent memory file system reconciliation. As part of the persistent memory file system reconciliation, high level file system metadata associated with a persistent memory file system of persistent memory is reconciled. Client access to the persistent memory file system is inaccessible until reconciliation of the high level file system metadata has completed. A first scanner is executed to traverse pages of the persistent memory in order to fix local inconsistencies associated with the pages. A local inconsistency of a first set of metadata or data of a page is fixed using a second set of metadata or data of the page. The first scanner is executed asynchronously in parallel with processing client I/O directed to the persistent memory file system.