发明申请
US20050091234A1 System and method for dividing data into predominantly fixed-sized chunks so that duplicate data chunks may be identified
有权
将数据分割成主要是固定大小的块的系统和方法,以便可以识别重复的数据块
- 专利标题: System and method for dividing data into predominantly fixed-sized chunks so that duplicate data chunks may be identified
- 专利标题(中): 将数据分割成主要是固定大小的块的系统和方法,以便可以识别重复的数据块
-
申请号: US10693284申请日: 2003-10-23
-
公开(公告)号: US20050091234A1公开(公告)日: 2005-04-28
- 发明人: Windsor Hsu , Shauchi Ong
- 申请人: Windsor Hsu , Shauchi Ong
- 申请人地址: US NY Armonk
- 专利权人: International Business Machines Corporation
- 当前专利权人: International Business Machines Corporation
- 当前专利权人地址: US NY Armonk
- 主分类号: G06F7/00
- IPC分类号: G06F7/00 ; G06F17/30
摘要:
A data chunking system divides data into predominantly fixed-sized chunks such that duplicate data may be identified. The data chunking system may be used to reduce the data storage and save network bandwidth by allowing storage or transmission of primarily unique data chunks. The system may also be used to increase reliability in data storage and network transmission, by allowing an error affecting a data chunk to be repaired with an identified duplicate chunk. The data chunking system chunks data by selecting a chunk of fixed size, then moving a window along the data until a match to existing data is found. As the window moves across the data, unique chunks predominantly of fixed size are formed in the data passed over. Several embodiments provide alternate methods of determining whether a selected chunk matches existing data and methods by which the window is moved through the data. To locate duplicate data, the data chunking system remembers data by computing a mathematical function of a data chunk and inserting the computed value into a hash table.
公开/授权文献
信息查询