System and method for dividing data into predominantly fixed-sized chunks so that duplicate data chunks may be identified

发明申请

US20050091234A1 System and method for dividing data into predominantly fixed-sized chunks so that duplicate data chunks may be identified 有权

标题翻译：将数据分割成主要是固定大小的块的系统和方法，以便可以识别重复的数据块

请登陆查看更多内容

专利标题： System and method for dividing data into predominantly fixed-sized chunks so that duplicate data chunks may be identified
专利标题（中）： 将数据分割成主要是固定大小的块的系统和方法，以便可以识别重复的数据块
申请号： US10693284

申请日： 2003-10-23
公开(公告)号： US20050091234A1

公开(公告)日： 2005-04-28
发明人: Windsor Hsu , Shauchi Ong
申请人： Windsor Hsu , Shauchi Ong
申请人地址： US NY Armonk
专利权人： International Business Machines Corporation
当前专利权人： International Business Machines Corporation
当前专利权人地址： US NY Armonk
主分类号： G06F7/00
IPC分类号： G06F7/00 ; G06F17/30

System and method for dividing data into predominantly fixed-sized chunks so that duplicate data chunks may be identified

摘要：

A data chunking system divides data into predominantly fixed-sized chunks such that duplicate data may be identified. The data chunking system may be used to reduce the data storage and save network bandwidth by allowing storage or transmission of primarily unique data chunks. The system may also be used to increase reliability in data storage and network transmission, by allowing an error affecting a data chunk to be repaired with an identified duplicate chunk. The data chunking system chunks data by selecting a chunk of fixed size, then moving a window along the data until a match to existing data is found. As the window moves across the data, unique chunks predominantly of fixed size are formed in the data passed over. Several embodiments provide alternate methods of determining whether a selected chunk matches existing data and methods by which the window is moved through the data. To locate duplicate data, the data chunking system remembers data by computing a mathematical function of a data chunk and inserting the computed value into a hash table.

摘要（中）：

数据分块系统将数据分成主要固定大小的块，以便可以识别重复数据。数据分块系统可用于通过允许主要唯一数据块的存储或传输来减少数据存储并节省网络带宽。也可以通过允许使用识别的重复块来修复影响数据块的错误来提高数据存储和网络传输的可靠性。数据分块系统通过选择固定大小的块来块数据，然后沿着数据移动窗口直到找到与现有数据的匹配。当窗口移动数据时，在传递的数据中形成主要是固定大小的独特块。几个实施例提供了确定所选择的块是否匹配现有数据的替代方法，以及通过该窗口移动数据的方法。为了定位重复数据，数据分块系统通过计算数据块的数学函数并将计算的值插入散列表来记住数据。

公开/授权文献

US07281006B2 System and method for dividing data into predominantly fixed-sized chunks so that duplicate data chunks may be identified 公开/授权日：2007-10-09

信息查询

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F7/00	通过待处理的数据的指令或内容进行运算的数据处理的方法或装置（逻辑电路入H03K19/00）