摘要:
A computer-implemented method for compressing a data set, the method comprising receiving a first data block of the data set, selecting automatically by a compression management module a compression module from a plurality of compression modules to apply to the first data block based on projected compression efficacy or resource utilization, and compressing the first data block with the selected compression module to generate a first compressed data block.
摘要:
A computer-implemented method for compressing a data set, the method comprising receiving a first data block of the data set, selecting automatically by a compression management module a compression module from a plurality of compression modules to apply to the first data block based on projected compression efficacy or resource utilization, and compressing the first data block with the selected compression module to generate a first compressed data block.
摘要:
A computer-implemented method for compressing a data set, the method comprising receiving a first data block of the data set, selecting automatically by a compression management module a compression module from a plurality of compression modules to apply to the first data block based on projected compression efficacy or resource utilization, and compressing the first data block with the selected compression module to generate a first compressed data block.
摘要:
Techniques for evaluating deduplication effectiveness of data chunks in a storage system are described herein. In one embodiment, metadata of first data chunks associated with a deduplicated storage system is examined, where the first data chunks have been partitioned according to a first chunk size. A second chunk size is calculated based on the examination of the metadata of first data chunks. Metadata of the first data chunks is merged according to the second chunk size to represent second data chunks to which the first data chunks would have been merged. A deduplication rate of the second data chunks is determined based on the merged metadata.
摘要:
A method for storing data in a data storage system by partitioning the data into a plurality of data chunks and generating representative data for each of the plurality of chunks by applying a predetermined algorithm to each chunk of the plurality of chunks. Subsequently, the representative data is compared and sorted. Representative data for base data chunks and representative data for other data chunks that can be stored relative to the base data chunks are identified by evaluating the sorted set of representative data. Finally, each of the other data chunks identified as those that can be stored relative to a base data chunk are stored in the data storage system as the difference between the data chunk and a base data chunk.
摘要:
Techniques for detecting unwanted data are described herein. In one embodiment, a request is received for storing a data object in a storage system from a client over a network, where the request includes first representative data representing the data object without including actual content of the data object. It is detected whether the data object contains unwanted content by comparing the first representative data with second representative data without accessing the actual content of the data object, where the second representative data represents the unwanted content. A response is transmitted to the client over the network indicating whether the data object is likely to contain the unwanted object based on comparison of the first and second representative data.
摘要:
A cost function is determined for assigning first deduplicating storage units of a first storage system for replication onto second deduplicating storage units of a second storage system. One or more of the first storage units in the first storage system are assigned to one or more of the second storage units in the second storage system based on a minimized cost resulting from the cost function.
摘要:
Techniques for searching data in a storage system are described herein. In one embodiment, in response to a request for searching target data in a storage system, first representative data for the target data being searched are generated by applying a predetermined algorithm to at least a portion of the target data. The first representative data are searched and compared with second representative data representing one or more data sets stored in the storage system. It is indicated a likelihood that the target data or similar content has been found in the storage system based on the search and comparison.
摘要:
Techniques for replicating data chunks in a storage system are described herein. In one embodiment, in response to a request for replicating data chunks of a source storage system having a first average chunk size to a target storage system having a second average chunk size, a new chunk size is determined based on metadata of the data chunks in view of an average chunk size of the target storage system. The data chunks are resized based on the new chunk size to generate resized data chunks. The resized data chunks are transmitted from the source storage system to the target storage system for replication.
摘要:
A request for allocating a storage unit of a storage system is received to back up data of one or more clients. The storage system includes multiple storage units and each storage unit storing data that is deduplicated within each storage unit. In response to the request, one or more of the storage units are selected based on an amount of deduplicated data that would be stored in each of the storage units after storing the data of the one or more clients. The selected one or more storage units are allocated to the one or more clients to back up the data of the one or more clients.