Abstract:
Techniques for a method of estimating deduplication potential are disclosed herein. The method includes steps of selecting randomly a plurality of data blocks from a data set as a sample of the data set, collecting fingerprints of the plurality of data blocks of the sample, identifying duplicates of fingerprints of the sample from the fingerprints of the plurality of data blocks, estimating a total number of unique fingerprints of the data set depending on a total number of the duplicates of fingerprints of the sample based on a probability of fingerprints from the data set colliding in the sample, and determining a total number of duplicates of fingerprints of the data set depending on the total number of the unique fingerprints of the data set.
Abstract:
A method performed in a system that has a plurality of volumes stored to storage hardware, the method including generating, for each of the volumes, a respective space saving potential iteratively over time and scheduling space saving operations among the plurality of volumes by analyzing each of the volumes for space saving potential and assigning priority of resources based at least in part on space saving potential.