- 专利标题: Reducing hash collisions in large scale data deduplication
-
申请号: US16048710申请日: 2018-07-30
-
公开(公告)号: US10762051B1公开(公告)日: 2020-09-01
- 发明人: Ravishankar Bhagavandas , Aaron B. Fernandes , Zehua Wang , Jiabin Li , Huihui Li
- 申请人: Amazon Technologies, Inc.
- 申请人地址: US WA Seattle
- 专利权人: Amazon Technologies, Inc.
- 当前专利权人: Amazon Technologies, Inc.
- 当前专利权人地址: US WA Seattle
- 代理机构: Leel & Hayes, P.C.
- 主分类号: G06F7/00
- IPC分类号: G06F7/00 ; G06F16/174 ; G06F16/11 ; G06F16/13
摘要:
A system obtains a first data chunk and a second data chunk of a plurality of data chunks associated with a first data snapshot of a computing system. A hash record is assigned to a data chunk, and used to create a hash value that is written to a first lookup table. The hash function is selected from a plurality of hash functions. The system creates a first archive by saving the plurality of data chunks and the first lookup table to a datastore. The system writes a second hash record for the individual data chunks to a second lookup table using the same hash functions that were used for the first lookup table. Dissimilar hash values between the first lookup table and the second lookup table are identified, and a second archive that includes data chunks with different data from the corresponding data chunks from the first data snapshot is created based on the data chunks with the dissimilar hash values.
信息查询