-
公开(公告)号:US20110246741A1
公开(公告)日:2011-10-06
申请号:US12752308
申请日:2010-04-01
CPC分类号: G06F17/30097 , G06F11/1453 , G06F17/30159 , G06F17/30348
摘要: A data deduplication method using a small hash digest dictionary in fast-access memory. The method includes receiving customer data, dividing the data into smaller chunks, and assigning hash values to each chunk. For each chunk, the method includes performing lookup for a duplicate chunk by accessing a small dictionary in memory with the chunk's hash value. When no entry, the small dictionary is updated to include the hash value to fill the dictionary with earliest received data. When an entry is found, the entry's hash value is compared with lookup value and if matched, reference data is returned and an entry counter is incremented. If not matched, additional accesses are attempted such as with additional indexes calculated using the hash value. Collisions may trigger an entry replacement such that some initially entered entries are replaced when determined to not be most repeating values such as based on their counter value.
摘要翻译: 一种使用快速访问存储器中的小散列摘要字典的重复数据删除方法。 该方法包括接收客户数据,将数据划分成更小的块,以及将哈希值分配给每个块。 对于每个块,该方法包括通过使用块的散列值访问存储器中的小字典来执行对重复块的查找。 当没有条目时,小字典被更新为包括哈希值以用最早接收的数据填充字典。 当条目被找到时,条目的哈希值与查找值进行比较,如果匹配,则返回引用数据,并增加条目计数器。 如果不匹配,则尝试附加访问,例如使用哈希值计算的附加索引。 冲突可以触发条目替换,使得当被确定为不是最重要的值(例如基于其计数器值)时,替换一些初始输入的条目。