Abstract:
Presented herein are mass data storage systems, file system protocols, non-transitory machine readable devices, and methods for storing data blocks in data file systems. Methods for compressing snapshot data in a data file system are disclosed which include: loading a snapshot file with one or more data blocks, the snapshot representing a state of the data file system at a point in time; determining if at least one of the snapshot data blocks is less than a predetermined byte value; responsive to a snapshot data block having a size that is less than the predetermined byte value, identifying a packed block configured to store data chunks from plural distinct snapshots and having available sufficient storage space to store the snapshot data block; and adding to the packed block the snapshot data block and lost-write context information corresponding to the snapshot data block.
Abstract:
Techniques for a method of estimating deduplication potential are disclosed herein. The method includes steps of selecting randomly a plurality of data blocks from a data set as a sample of the data set, collecting fingerprints of the plurality of data blocks of the sample, identifying duplicates of fingerprints of the sample from the fingerprints of the plurality of data blocks, estimating a total number of unique fingerprints of the data set depending on a total number of the duplicates of fingerprints of the sample based on a probability of fingerprints from the data set colliding in the sample, and determining a total number of duplicates of fingerprints of the data set depending on the total number of the unique fingerprints of the data set.
Abstract:
A first plurality of block identifiers is sorted based, at least in part, on a measure of spatial locality. A second plurality of block identifiers is sorted based, at least in part, on the measure of spatial locality. At least the first plurality of block identifiers and the second plurality of block identifiers are incrementally merged into a third plurality of block identifiers based, at least in part, on the measure of spatial locality. A block of data corresponding to metadata associated with a plurality of block identifiers of the third plurality of block identifiers is updated.
Abstract:
A first plurality of block identifiers is sorted based, at least in part, on a measure of spatial locality. A second plurality of block identifiers is sorted based, at least in part, on the measure of spatial locality. At least the first plurality of block identifiers and the second plurality of block identifiers are incrementally merged into a third plurality of block identifiers based, at least in part, on the measure of spatial locality. A block of data corresponding to metadata associated with a plurality of block identifiers of the third plurality of block identifiers is updated.
Abstract:
A system and method for data replication is described. A destination storage system receives a message from a source storage system as part of a replication process. The message includes an identity of a first file, information about where the first file is stored in the source storage system, a name of a first data being used by the first file and stored at a first location of the source storage system, and a fingerprint of the first data. The destination storage system determines that a mapping database is unavailable or inaccurate, and accesses a fingerprint database using the fingerprint of the first data received with the message to determine whether data stored in the destination storage system has a fingerprint identical to the fingerprint of the first data.
Abstract:
Presented herein are mass data storage systems, file system protocols, non-transitory machine readable devices, and methods for storing data blocks in data file systems. Methods for compressing snapshot data in a data file system are disclosed which include: loading a snapshot file with one or more data blocks, the snapshot representing a state of the data file system at a point in time; determining if at least one of the snapshot data blocks is less than a predetermined byte value; responsive to a snapshot data block having a size that is less than the predetermined byte value, identifying a packed block configured to store data chunks from plural distinct snapshots and having available sufficient storage space to store the snapshot data block; and adding to the packed block the snapshot data block and lost-write context information corresponding to the snapshot data block.
Abstract:
Techniques to clone a writeable data object in non-persistent memory are disclosed. The writeable data object is stored in a storage structure in non-persistent memory that corresponds to a portion of a persistent storage. The techniques enable cloning of the writeable data object without having to wait until the writeable data object is saved to the persistent storage and without needing to quiesce incoming operations (e.g., reads and writes) to the writeable data object.
Abstract:
A first plurality of block identifiers is sorted based, at least in part, on a measure of spatial locality. A second plurality of block identifiers is sorted based, at least in part, on the measure of spatial locality. At least the first plurality of block identifiers and the second plurality of block identifiers are incrementally merged into a third plurality of block identifiers based, at least in part, on the measure of spatial locality. A block of data corresponding to metadata associated with a plurality of block identifiers of the third plurality of block identifiers is updated.
Abstract:
A system and method for logically organizing compressed data. In one aspect, a destination storage server receives a write request that includes multiple data blocks and specifies corresponding file block numbers. An extent-based file system executing on the storage server accesses intermediate block entries that each associates one of the file block numbers with a respective extent block number. The file system, in cooperation with a compression engine, compresses the data blocks into a set of one or more compressed data blocks. The file system stores the compressed data blocks at physical locations corresponding to physical block numbers and allocates, within an extent map, pointers from an extent ID to the extent block numbers, and pointers from the extent ID to the physical block numbers.
Abstract:
A system and method for data replication is described. A destination storage system receives a message from a source storage system as part of a replication process. The message includes an identity of a first file, information about where the first file is stored in the source storage system, a name of a first data being used by the first file and stored at a first location of the source storage system, and a fingerprint of the first data. The destination storage system determines that a mapping database is unavailable or inaccurate, and accesses a fingerprint database using the fingerprint of the first data received with the message to determine whether data stored in the destination storage system has a fingerprint identical to the fingerprint of the first data.