摘要:
Delta compression after identity deduplication is disclosed. A first data segment is determined to be identical to a first previous data segment. A second data segment, not determined to be identical to a second previous data segment, is then determined to be similar to a third previous data segment.
摘要:
Delta compression after identity deduplication is disclosed. A first data segment is determined to be identical to a first previous data segment. A second data segment, not determined to be identical to a second previous data segment, is then determined to be similar to a third previous data segment.
摘要:
Stream locality delta compression is disclosed. A previous stream indicated locale of data segments is selected. A first data segment is then determined to be similar to a data segment in the stream indicated locale.
摘要:
Data replication with delta compression is disclosed. A primary system and a replica system are determined to both have an identical first data segment that is similar to a second data segment. The second data segment is encoded, wherein the encoding refers to the first data segment.
摘要:
The method and apparatus collect file recipes from deduplicated data storage systems, the file recipes consist of a list of fingerprints of data chunks of a file. Detailed meta-data for each unique data chunk is also collected. In an offline process, research and analysis can be performed on either the meta-data itself or on a reconstruction of a full trace of meta-data constructed by matching recipe fingerprints to the corresponding meta-data. The method and system can generate the full meta-data trace efficiently in an on-line or off-line process. Typical deduplicated storage systems achieve 10× or higher deduplication rates, and the meta-data collection is faster than processing all of the original files and produces compact meta-data that is smaller to store.
摘要:
The method and apparatus collect file recipes from deduplicated data storage systems, the file recipes consist of a list of fingerprints of data chunks of a file. Detailed meta-data for each unique data chunk is also collected. In an offline process, research and analysis can be performed on either the meta-data itself or on a reconstruction of a full trace of meta-data constructed by matching recipe fingerprints to the corresponding meta-data. The method and system can generate the full meta-data trace efficiently in an on-line or off-line process. Typical deduplicated storage systems achieve 10× or higher deduplication rates, and the meta-data collection is faster than processing all of the original files and produces compact meta-data that is smaller to store.
摘要:
A method for storing data in a data storage system by partitioning the data into a plurality of data chunks and generating representative data for each of the plurality of chunks by applying a predetermined algorithm to each chunk of the plurality of chunks. Subsequently, the representative data is compared and sorted. Representative data for base data chunks and representative data for other data chunks that can be stored relative to the base data chunks are identified by evaluating the sorted set of representative data. Finally, each of the other data chunks identified as those that can be stored relative to a base data chunk are stored in the data storage system as the difference between the data chunk and a base data chunk.
摘要:
Techniques for evaluating deduplication effectiveness of data chunks in a storage system are described herein. In one embodiment, metadata of first data chunks associated with a deduplicated storage system is examined, where the first data chunks have been partitioned according to a first chunk size. A second chunk size is calculated based on the examination of the metadata of first data chunks. Metadata of the first data chunks is merged according to the second chunk size to represent second data chunks to which the first data chunks would have been merged. A deduplication rate of the second data chunks is determined based on the merged metadata.
摘要:
Techniques for searching data in a storage system are described herein. In one embodiment, in response to a request for searching target data in a storage system, first representative data for the target data being searched are generated by applying a predetermined algorithm to at least a portion of the target data. The first representative data are searched and compared with second representative data representing one or more data sets stored in the storage system. It is indicated a likelihood that the target data or similar content has been found in the storage system based on the search and comparison.
摘要:
Techniques for replicating data chunks in a storage system are described herein. In one embodiment, in response to a request for replicating data chunks of a source storage system having a first average chunk size to a target storage system having a second average chunk size, a new chunk size is determined based on metadata of the data chunks in view of an average chunk size of the target storage system. The data chunks are resized based on the new chunk size to generate resized data chunks. The resized data chunks are transmitted from the source storage system to the target storage system for replication.