Method and system for efficiently handling small files in a single instance storage data store
    1.
    发明授权
    Method and system for efficiently handling small files in a single instance storage data store 有权
    在单一实例存储数据存储中有效处理小文件的方法和系统

    公开(公告)号:US08572055B1

    公开(公告)日:2013-10-29

    申请号:US12164284

    申请日:2008-06-30

    IPC分类号: G06F7/00 G06F17/00

    摘要: A method, system and apparatus for efficient storage of small files in a segment-based deduplication scheme by allocating multiple small files to a single data segment is provided. A mechanism for distinguishing between large files (e.g., files that are on the order of the size of a segment or larger) and smaller files, and starting a new segment at the beginning of a large file is also provided. A file attribute-based system for determining an identity of a small file at which to begin a new segment and then allocating subsequent small files to that segment and contiguous segments until a next small file having an appropriate attribute subsequently is encountered to begin a new segment is further provided. In one aspect of the present invention a filename hash is used for file attribute analysis to determine when a new segment should begin. Using such a mechanism, multiple small files can be allocated to a data segment and at the same time continue to provide for efficient storage of large files within separate data segments. The file attribute analysis further provides for an increase in deduplication rate for subsequently provided copies of the small files (e.g., in a backup) since segment boundaries remain constant in spite of file additions or deletions.

    摘要翻译: 提供了一种用于通过将多个小文件分配给单个数据段来在基于段的重复数据删除方案中有效存储小文件的方法,系统和装置。 还提供了用于区分大文件(例如,大小的大小的大小的文件)和较小的文件以及在大文件的开始处开始新的段的机制。 一种基于文件属性的系统,用于确定开始新段的小文件的身份,然后分配后续小文件到该段和连续段,直到遇到具有适当属性的下一个小文件以开始新段 进一步提供。 在本发明的一个方面,使用文件名散列来进行文件属性分析,以确定何时开始新的片段。 使用这种机制,可以将多个小文件分配给数据段,并且同时继续提供在单独的数据段内有效地存储大文件。 文件属性分析进一步提供随后提供的小文件副本(例如,备份)中重复数据删除率的增加,因为段边界保持不变,尽管文件添加或删除。

    Method and apparatus to recover from interrupted data streams in a deduplication system
    2.
    发明授权
    Method and apparatus to recover from interrupted data streams in a deduplication system 有权
    在重复数据删除系统中从中断的数据流中恢复的方法和装置

    公开(公告)号:US08074043B1

    公开(公告)日:2011-12-06

    申请号:US12363207

    申请日:2009-01-30

    申请人: Michael John Zeis

    发明人: Michael John Zeis

    IPC分类号: G06F12/00

    CPC分类号: G06F17/30156 G06F11/1453

    摘要: Detection and proper deduplication of a re-started data stream in a segmentation analysis-based deduplication system are provided by retaining information about a previous data stream and using that information when performing segmentation of the re-started data stream. Information such as a segment size associated with a last data object received in the previous data stream and a record of how much data was present in the last segment associated with the previous data stream is retained. The retained segment size information is used to set a first data object segment size of the re-started data stream, and the size of last segment information is used to determine how much information should be put in the first segment associated with the re-started data stream in order to maintain proper alignment of the remainder of the segments for the first data object in the re-started data stream for deduplication.

    摘要翻译: 在基于分段分析的重复数据删除系统中重新启动的数据流的检测和适当的重复数据删除是通过保留关于先前的数据流的信息并在对重新启动的数据流执行分段时使用该信息来提供的。 诸如与在先前数据流中接收的最后数据对象相关联的段大小的信息以及与先前数据流相关联的最后段中存在多少数据的记录被保留。 使用保留的段大小信息来设置重新启动的数据流的第一数据对象段大小,并且使用最后段信息的大小来确定在与重新启动相关联的第一段中应该放置多少信息 数据流,以便在用于重复数据删除的重新启动的数据流中保持用于第一数据对象的其余段的适当对准。