Range-based data deduplication using a hash table with entries replaced based on address alignment information
Abstract:
Deduplicated data storage is provided by presenting a virtual volume mapped by a translation table to a physical volume of a physical data storage system. The translation table maps sets of ranges of duplicate data blocks of the virtual volume to corresponding individual ranges of shared data blocks of the physical volume. A hash table for identifying duplicate data is indexed by a portion of a hash value calculated from newly written data blocks, and has entries each identifying an address alignment of the corresponding data block. In operation, existing entries are replaced with new entries for colliding data blocks having better address alignment, promoting wider address-space separation of the entries. Upon occurrence of a hit in the hash table, for a given data block in a range of newly written data blocks, data blocks of the range are compared to corresponding blocks in a range identified by the hit to maximize a size of a region to be identified by the translation table as duplicate data.
Public/Granted literature
Information query
Patent Agency Ranking
0/0