Abstract:
Overwriting part of compressed data without decompressing on-disk compressed data is includes by receiving a write request for a block of data in a compression group from a client, wherein the compression group comprises a group of data blocks that is compressed, wherein the block of data is uncompressed. The storage server partially overwrites the compression group, wherein the compression group remains compressed while the partial overwriting is performed. The storage server determines whether the partially overwritten compression group including the uncompressed block of data should be compressed. The storage server defers compression of the partially overwritten compression group if the partially overwritten compression group should not be compressed. The storage server compresses the partially overwritten compression group if the partially overwritten compression group should be compressed.
Abstract:
A system and method for data replication is described. A destination storage system receives a message from a source storage system as part of a replication process. The message includes an identity of a first file, information about where the first file is stored in the source storage system, a name of a first data being used by the first file and stored at a first location of the source storage system, and a fingerprint of the first data. The destination storage system determines that a mapping database is unavailable or inaccurate, and accesses a fingerprint database using the fingerprint of the first data received with the message to determine whether data stored in the destination storage system has a fingerprint identical to the fingerprint of the first data.
Abstract:
Presented herein are mass data storage networks, file system protocols, non-transitory machine readable devices, and methods for storing data blocks in mass data storage systems. Methods for storing data blocks in a file system are disclosed which include: receiving by storage controller of the data storage system a request to write a data file to a system storage module; determining whether the data file includes a sub-K data chunk that is less than approximately four kilobytes; identifying a packed block that stores a plurality of sub-K data chunks and has sufficient storage space available to store the sub-K data chunk; and placing, by the storage controller in the packed block, the sub-K data chunk and a corresponding data length and a respective offset identifying a location of the sub-K data chunk in the packed block.
Abstract:
A first plurality of block identifiers is sorted based, at least in part, on a measure of spatial locality. A second plurality of block identifiers is sorted based, at least in part, on the measure of spatial locality. At least the first plurality of block identifiers and the second plurality of block identifiers are incrementally merged into a third plurality of block identifiers based, at least in part, on the measure of spatial locality. A block of data corresponding to metadata associated with a plurality of block identifiers of the third plurality of block identifiers is updated.
Abstract:
Presented herein are mass data storage networks, file system protocols, non-transitory machine readable devices, and methods for storing data blocks in mass data storage systems. Methods for storing data blocks in a file system are disclosed which include: receiving by storage controller of the data storage system a request to write a data file to a system storage module; determining whether the data file includes a sub-K data chunk that is less than approximately four kilobytes; identifying a packed block that stores a plurality of sub-K data chunks and has sufficient storage space available to store the sub-K data chunk; and placing, by the storage controller in the packed block, the sub-K data chunk and a corresponding data length and a respective offset identifying a location of the sub-K data chunk in the packed block.
Abstract:
Systems, methods, and computer program products implementing hybrid file structures for data storage are provided. One embodiment of a method performed in a computer-based storage system includes writing a file as data blocks in an array of storage devices. The method includes associating the data blocks with metadata related to at least one location in the array of storage devices for later access to the data blocks. The file is represented as a hierarchical data structure having a plurality of nodes. A first portion of nodes has a first span type, and a second portion of nodes has a second span type. The data structure includes a buftree. The first span type includes a fixed-span type. The second span type includes a variable-span type.
Abstract:
Techniques to clone a writeable data object in non-persistent memory are disclosed. The writeable data object is stored in a storage structure in non-persistent memory that corresponds to a portion of a persistent storage. The techniques enable cloning of the writeable data object without having to wait until the writeable data object is saved to the persistent storage and without needing to quiesce incoming operations (e.g., reads and writes) to the writeable data object.
Abstract:
Techniques to clone a writeable data object in non-persistent memory are disclosed. The writeable data object is stored in a storage structure in non-persistent memory that corresponds to a portion of a persistent storage. The techniques enable cloning of the writeable data object without having to wait until the writeable data object is saved to the persistent storage and without needing to quiesce incoming operations (e.g., reads and writes) to the writeable data object.
Abstract:
A request is received to remove duplicate data. A log data container associated with a storage volume in a storage server is accessed. The log data container includes a plurality of entries. Each entry is identified by an extent identifier in a data structures stored in a volume associated with the storage server. For each entry in the log data container, a determination is made if the entry matches another entry in the log data container. If the entry matches another entry in the log data container, a determination is made of a donor extent and a recipient extent. If an external reference count associated with the recipient extent equals a first predetermined value, block sharing is performed for the donor extent and the recipient extent. A determination is made if the reference count of the donor extent equals a second predetermined value. If the reference count of the donor extent equals the second predetermined value, the donor extent is freed.
Abstract:
A first plurality of block identifiers is sorted based, at least in part, on a measure of spatial locality. A second plurality of block identifiers is sorted based, at least in part, on the measure of spatial locality. At least the first plurality of block identifiers and the second plurality of block identifiers are incrementally merged into a third plurality of block identifiers based, at least in part, on the measure of spatial locality. A block of data corresponding to metadata associated with a plurality of block identifiers of the third plurality of block identifiers is updated.