Similarity matching
    1.
    发明授权

    公开(公告)号:US11093151B1

    公开(公告)日:2021-08-17

    申请号:US16780210

    申请日:2020-02-03

    IPC分类号: G06F12/00 G06F3/06

    摘要: A method, a system and a computer program product for performing deduplicating data. A data stream having a plurality of data zones is received. One or more data storage locations in a plurality of data storage locations for deduplicating one or more zones in the plurality of zones is identified. Each data storage location stores its respective deduplicated data zones. A data storage location for deduplicating a first data zone is selected. The first data zone is deduplicated using the selected data storage location.

    Delta Compression
    2.
    发明申请
    Delta Compression 审中-公开

    公开(公告)号:US20190251189A1

    公开(公告)日:2019-08-15

    申请号:US15893163

    申请日:2018-02-09

    IPC分类号: G06F17/30 G06F11/14

    摘要: Delta compression method, system and computer program product. Portions of source and target data files are hashed using a hashing function. A target data file is compared against the source data file to determine at least one delta difference between the files. A source data file hashing table is generated. The table includes hashed portions of the source and target data files stored in corresponding source file offset locations and corresponding target file offset locations, respectively. Portions of the source and target files are compared using corresponding source and target file offset locations. At least one common sequence of characters in the portions of the source and target files is determined based on the comparison. A patch file is generated based on the determined sequence of characters.

    Next-level multi-level deduplication

    公开(公告)号:US10067946B2

    公开(公告)日:2018-09-04

    申请号:US15482376

    申请日:2017-04-07

    IPC分类号: G06F17/30 G06F11/14

    摘要: A method, a system, and a computer program product for performing next level multi-level deduplication. A first zone stamp for a first data zone is generated and compared to a second zone stamp representing a second data zone, where the zones are first level data zones. The first and second data zones are deduplicated when the first zone stamp matches the second zone stamp. A second-level first zone stamp is selected when there is no match between first and second zone stamps. The second-level first zone stamp, representing a second-level first data zone in the first data zone, is compared to the second zone stamp and/or a second-level second zone stamp representing a second-level second data zone. The second-level first zone and one of the second data zone and the second-level second zone are deduplicated when the second-level first zone stamp matches one of the second zone stamp and the second-level second zone stamp.

    ADAPTIVE BANDWIDTH MANAGER
    4.
    发明申请
    ADAPTIVE BANDWIDTH MANAGER 审中-公开
    自适应带宽管理器

    公开(公告)号:US20170060696A1

    公开(公告)日:2017-03-02

    申请号:US14829885

    申请日:2015-08-19

    IPC分类号: G06F11/14

    摘要: A system, a method, and a computer program product for adaptively management bandwidth of a deduplication system are disclosed. A bandwidth policy for replication of data from a first deduplication location to a second deduplication location is determined. The bandwidth policy allocates a predetermined bandwidth for the replication of data. The deduplication locations are communicatively coupled via a network. Using the determined bandwidth policy, data from the first deduplication location is replicated to the second deduplication location based on the allocated bandwidth.

    摘要翻译: 公开了一种用于重复数据删除系统的自适应管理带宽的系统,方法和计算机程序产品。 确定用于将数据从第一重复数据删除位置复制到第二重复数据删除位置的带宽策略。 带宽策略为数据的复制分配预定的带宽。 重复数据删除位置通过网络进行通信耦合。 使用确定的带宽策略,来自第一重复数据删除位置的数据将根据分配的带宽被复制到第二重复数据删除位置。

    Systems and Methods for Version Chain Clustering
    5.
    发明申请
    Systems and Methods for Version Chain Clustering 审中-公开
    版本链聚类的系统和方法

    公开(公告)号:US20130066868A1

    公开(公告)日:2013-03-14

    申请号:US13273080

    申请日:2011-10-13

    IPC分类号: G06F17/30

    摘要: A system, a method and a computer program product for storing data, which include receiving a data stream having a plurality of transactions that include at least one portion of data, determining whether at least one portion of data within at least one transaction is substantially similar to at least another portion of data within at least one transaction, clustering together at least one portion of data and at least another portion of data within at least one transaction, selecting one of at least one portion of data and at least another portion of data as a representative of at least one portion of data and at least another portion of data in the received data stream, and storing each representative of a portion of data from each transaction in the plurality of transactions, wherein a plurality of representatives is configured to form a chain representing the received data stream.

    摘要翻译: 一种用于存储数据的系统,方法和计算机程序产品,其包括接收具有包括至少一部分数据的多个事务的数据流,确定至少一个事务中的至少一部分数据是否基本相似 至少一个交易中的至少另一部分数据,将数据的至少一部分和至少一个交易中的至少另一部分数据聚集在一起,选择数据的至少一部分和至少另一部分数据中的一个 作为所接收的数据流中数据的至少一部分和至少另一部分数据的代表,并且存储每个代表来自多个事务中每个交易的一部分数据的代表,其中多个代表被配置为形成 代表所接收的数据流的链。

    Systems and methods for version chain clustering

    公开(公告)号:US11336295B2

    公开(公告)日:2022-05-17

    申请号:US16698140

    申请日:2019-11-27

    IPC分类号: H03M7/30 G06F16/174

    摘要: A system, a method and a computer program product for storing data, which include receiving a data stream having a plurality of transactions that include at least one portion of data, determining whether at least one portion of data within at least one transaction is substantially similar to at least another portion of data within at least one transaction, clustering together at least one portion of data and at least another portion of data within at least one transaction, selecting one of at least one portion of data and at least another portion of data as a representative of at least one portion of data and at least another portion of data in the received data stream, and storing each representative of a portion of data from each transaction in the plurality of transactions, wherein a plurality of representatives is configured to form a chain representing the received data stream.

    SIMILARITY MATCHING
    8.
    发明申请

    公开(公告)号:US20210240377A1

    公开(公告)日:2021-08-05

    申请号:US16780210

    申请日:2020-02-03

    IPC分类号: G06F3/06

    摘要: A method, a system and a computer program product for performing deduplicating data. A data stream having a plurality of data zones is received. One or more data storage locations in a plurality of data storage locations for deduplicating one or more zones in the plurality of zones is identified. Each data storage location stores its respective deduplicated data zones. A data storage location for deduplicating a first data zone is selected. The first data zone is deduplicated using the selected data storage location.

    Multi-level deduplication
    9.
    发明授权

    公开(公告)号:US10452617B2

    公开(公告)日:2019-10-22

    申请号:US15620246

    申请日:2017-06-12

    摘要: A method, a system, and a computer-implemented method for performing multi-level deduplication of data are disclosed. A zone stamp is generated for each zone in a plurality of zones contained in at least one data stream. The zone stamp is compared to another zone stamp. The zone stamp and another zone stamp represent zones in the plurality of zones. The comparison is performed for zones at corresponding zone levels based on a determination that a zone stamp of a zone of a preceding zone level is not similar to another zone stamp of another preceding zone level. The zone at the preceding zone level includes at least one zone of a next zone level having a size smaller than or equal to a size of the zone of the preceding zone level. The zone and another zone are deduplicated based on a determination that the zone stamp is similar to another zone stamp.