摘要:
It is determined that a first data unit is to be written to a storage device and that the first data unit is associated with a first attribute. In response to determining that the first data unit is associated with the first attribute, a first identifier is selected from a first identifier space and the first identifier is associated with the first data unit. It is determined that a second data unit is to be written to the storage device and that the second data unit is associated with the second attribute. In response to determining that the second data unit is associated with the second attribute, a second identifier is selected from a second identifier space and the second identifier is associated with the second data unit.
摘要:
A storage server is configured to receive a request to store a data block from a client. The request to store the data block is serviced by the storage server by compressing the data block into a compression group, which includes a number of compressed data blocks. The storage server stores the compression group in a non-volatile memory and flushes the compression group from the non-volatile memory to a physical storage device in response to reaching a consistency point. By compressing data to be stored in system memory of a storage server, the amount of data that can be processed during a given time period by a data storage system is increased. Furthermore, an increase in performance can be achieved at a lower cost, since the cost of additional physical system memory modules can be avoided.
摘要:
It is determined that a first data unit is to be written to a storage device and that the first data unit is associated with a first attribute. In response to determining that the first data unit is associated with the first attribute, a first identifier is selected from a first identifier space and the first identifier is associated with the first data unit. It is determined that a second data unit is to be written to the storage device and that the second data unit is associated with the second attribute. In response to determining that the second data unit is associated with the second attribute, a second identifier is selected from a second identifier space and the second identifier is associated with the second data unit.
摘要:
A network adapter receives a request to store a data block. The data block is sent from the network adapter to a compression module. The compression module generates a compressed data block from the data block. The compressed data block or a reference to the compressed data block is stored in a buffer cache. The compressed data block is stored in nonvolatile memory. It is determined that the compressed data block should be flushed a storage device. In response to determining that the compressed data block should be flushed to the storage device, the compressed data block is flushed from the nonvolatile memory to the storage device.
摘要:
Systems and methods for preserving storage efficiency during restoration of data from the cloud are provided. In one embodiment, a CBMAP is maintained that maps cloud block numbers (CBNs) to respective corresponding block numbers of a volume of a data storage system in which previously restored data has been stored by a previously restored file. By making use of the CBMAP during the restoration process, storage of duplicate file data blocks on the volume may be avoided by sharing with a current file being restored a reference to the corresponding file data block previously stored on the volume and associated with the previously restored file. In addition to preserving storage efficiency, use of the CBMAP facilitates avoidance of repeated GET operations for data associated with CBNs previously retrieved from the cloud and stored to the volume, thereby reducing data access costs as well as latency of the restore operation.
摘要:
A system and method for providing a substantially constant-time copy operation for file system objects managed by a storage server begins by generating a snapshot of at least a portion of a data set managed by the storage server. The system then performs a copy operation in the storage server to generate a copy of the data set separate from the snapshot, on a set of block locations containing a predetermined reference value. During the copy operation to generate the copy of the data set separate from the snapshot, the system can receive from a requester a first read request directed to the copy of the data set that the copy operation is to generate. In response to the first data request, the system provides data from the snapshot to the requester by the storage server.
摘要:
A technique for organizing data to facilitate data deduplication includes dividing a block-based set of data into multiple “chunks”, where the chunk boundaries are independent of the block boundaries (due to the hashing algorithm). Metadata of the data set, such as block pointers for locating the data, are stored in a tree structure that includes multiple levels, each of which includes at least one node. The lowest level of the tree includes multiple nodes that each contain chunk metadata relating to the chunks of the data set. In each node of the lowest level of the buffer tree, the chunk metadata contained therein identifies at least one of the chunks. The chunks (user-level data) are stored in one or more system files that are separate from the buffer tree and not visible to the user.
摘要:
Systems, methods, and non-transitory machine readable media for determining block characteristics include one or more processors, a memory for storing instructions for the one or more processors, persistent storage, and a file system implemented in the persistent storage and storing data in the persistent storage using a plurality of blocks. When the stored instructions are executed by the one or more processors, the one or more processors are configured to traverse the plurality of blocks, read contents of a first block selected from the plurality of blocks, determine one or more characteristics of the first block from metadata within the block, and selectively perform or not perform a storage operation with respect to the first data block in response to determining the one or more characteristics. In some embodiments, the storage operation is a replication operation or a deduplication operation.
摘要:
A system and method for providing a substantially constant-time copy operation for file system objects managed by a storage server begins by generating a snapshot of at least a portion of a data set managed by the storage server. The system then performs a copy operation in the storage server to generate a copy of the data set separate from the snapshot, on a set of block locations containing a predetermined reference value. During the copy operation to generate the copy of the data set separate from the snapshot, the system can receive from a requester a first read request directed to the copy of the data set that the copy operation is to generate. In response to the first data request, the system provides data from the snapshot to the requester by the storage server.