Accelerated and memory efficient similarity matching

    公开(公告)号:US11204907B2

    公开(公告)日:2021-12-21

    申请号:US16704798

    申请日:2019-12-05

    摘要: A method, a system, and a computer program product for performing accelerated and memory efficient similarity matching. A data stream having a plurality of data zones is received. Each zone includes a zone identifier. A plurality of hashing values for each zone are generated. Each hashing value is generated based on a portion of a zone. A storage structure having a plurality of storage containers is generated. Each storage container stores one or more hashing values associated with each respective storage container and a plurality of zone identifiers referencing the associated hashing values. At least one storage container includes a listing of zone identifiers stored in each storage container. Using the storage structure, the received data stream is deduplicated.

    Parallelizing and deduplicating backup data

    公开(公告)号:US11182345B2

    公开(公告)日:2021-11-23

    申请号:US16410613

    申请日:2019-05-13

    摘要: A method, a system, and a computer program product for performing a backup of data are disclosed. A grid server in a plurality of grid servers is selected for deduplicating a segment of data in a plurality of segments of data contained within a data stream. The segment of data is forwarded to the selected grid server for deduplication. A zone contained within the forwarded segment of data is deduplicated using the selected server. The deduplication is performed based on a listing of a plurality of zone stamps. Each zone stamp in the plurality of zone stamps represents a zone in a plurality of zones deduplicated by at least one server in the plurality of grid servers.

    Delta compression
    33.
    发明授权

    公开(公告)号:US11126594B2

    公开(公告)日:2021-09-21

    申请号:US15893163

    申请日:2018-02-09

    IPC分类号: G06F16/174 G06F11/14 H03M7/30

    摘要: Delta compression method, system and computer program product. Portions of source and target data files are hashed using a hashing function. A target data file is compared against the source data file to determine at least one delta difference between the files. A source data file hashing table is generated. The table includes hashed portions of the source and target data files stored in corresponding source file offset locations and corresponding target file offset locations, respectively. Portions of the source and target files are compared using corresponding source and target file offset locations. At least one common sequence of characters in the portions of the source and target files is determined based on the comparison. A patch file is generated based on the determined sequence of characters.

    PARALLELIZING AND DEDUPLICATING BACKUP DATA
    35.
    发明申请
    PARALLELIZING AND DEDUPLICATING BACKUP DATA 审中-公开
    并行和重新配置备份数据

    公开(公告)号:US20170046360A1

    公开(公告)日:2017-02-16

    申请号:US14825322

    申请日:2015-08-13

    IPC分类号: G06F17/30

    摘要: A method, a system, and a computer program product for performing a backup of data are disclosed. A grid server in a plurality of grid servers is selected for deduplicating a segment of data in a plurality of segments of data contained within a data stream. The segment of data is forwarded to the selected grid server for deduplication. A zone contained within the forwarded segment of data is deduplicated using the selected server. The deduplication is performed based on a listing of a plurality of zone stamps. Each zone stamp in the plurality of zone stamps represents a zone in a plurality of zones deduplicated by at least one server in the plurality of grid servers.

    摘要翻译: 公开了一种用于执行数据备份的方法,系统和计算机程序产品。 多个网格服务器中的网格服务器被选择用于对数据流中包含的多个数据段中的数据段进行重复数据删除。 将数据段转发到所选网格服务器进行重复数据消除。 转发的数据段中包含的区域将使用所选服务器进行重复数据删除。 基于多个区域邮票的列表来执行重复数据删除。 多个区段标记中的每个区段标记表示由多个网格服务器中的至少一个服务器重复去除的多个区域中的区域。

    Systems and methods for managing delta version chains

    公开(公告)号:US09430546B2

    公开(公告)日:2016-08-30

    申请号:US14082921

    申请日:2013-11-18

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30575 G06F17/30162

    摘要: A system, a method, and a computer program product for managing delta version chains are provided. A version chain having a plurality of versions of data is provided. A first delta-compressed version and a second delta-compressed version corresponding to a first version of data in the version chain and a second version of data in the version chain, respectively, are selected. A third delta-compressed version configured to be independent of at least one of the first delta-compressed version and the second delta-compressed version and further configured to contain at least one third instruction determined based on at least one of the following: the first insert instruction, the second insert instruction, the first copy instruction, and the second copy instruction, is generated.

    Adaptive scheduled periodic caching
    37.
    发明授权
    Adaptive scheduled periodic caching 有权
    自适应预定义周期性缓存

    公开(公告)号:US09223812B2

    公开(公告)日:2015-12-29

    申请号:US14084136

    申请日:2013-11-19

    IPC分类号: G06F17/30

    摘要: A system, a method, and a computer program product for adaptive scheduled periodic caching are disclosed. A data stream is received. The data stream contains a plurality of versions of data arranged in a plurality of data clusters. Each data cluster includes an anchor version having a plurality of versions of data dependent on the anchor version. A size of each anchor version of each data cluster is determined. A number of versions of data dependent on each anchor version is also determined. For each anchor version, a ratio of the determined number of dependent versions of data to the determined size of each anchor is computed. At least one anchor version for storing in a memory location is selected based on the computed ratio.

    摘要翻译: 公开了一种用于自适应调度周期性高速缓存的系统,方法和计算机程序产品。 接收数据流。 数据流包含布置在多个数据簇中的数据的多个版本。 每个数据集群包括具有取决于锚版本的多个数据版本的锚版本。 确定每个数据集群的每个锚版本的大小。 还确定了依赖于每个锚版本的多个数据版本。 对于每个锚版本,计算确定的数据的依赖版本数与确定的每个锚的大小的比率。 基于所计算的比例来选择用于存储在存储器位置中的至少一个锚版本。

    Method and apparatus for content-aware and adaptive deduplication
    38.
    发明授权
    Method and apparatus for content-aware and adaptive deduplication 有权
    用于内容感知和自适应重复数据消除的方法和装置

    公开(公告)号:US09223794B2

    公开(公告)日:2015-12-29

    申请号:US14444700

    申请日:2014-07-28

    摘要: A method, a system, an apparatus, and a computer readable medium for transmission of data across a network are disclosed. The method includes receiving a data stream, analyzing the received data stream to determine a starting location and an ending location of each zone within the received data stream, based on the starting and ending locations, generating a zone stamp identifying the zone, the zone stamp includes a sequence of contiguous characters representing at least a portion of data in the zone, wherein the order of characters in the zone stamp corresponds to the order of data in the zone, comparing the zone stamp with another zone stamp of another zone in any data stream received, determining whether the zone is substantially similar to another zone by detecting that the zone stamp is substantially similar to another zone stamp, delta-compressing zones within any data stream received that have been determined to have substantially similar zone stamps, thereby deduplicating zones having substantially similar zone stamps within any data stream received, and transmitting the deduplicated zones across the network from one storage location to another storage location.

    摘要翻译: 公开了一种用于跨网络传输数据的方法,系统,装置和计算机可读介质。 该方法包括接收数据流,基于开始和结束位置,分析接收到的数据流以确定接收到的数据流内的每个区域的起始位置和结束位置,生成标识区域的区域戳,区域戳 包括表示区域中的数据的至少一部分的连续字符序列,其中区域戳中的字符顺序对应于区域中的数据顺序,将区域戳与任何数据中的另一个区域的另一区域戳进行比较 接收到的流,通过检测区域戳与其他区域戳基本相似来确定该区域是否基本上类似于另一个区域,在接收到的已经被确定为具有基本上类似的区域戳的任何数据流内的增量压缩区域,从而对区域进行重复数据删除 在接收到的任何数据流内具有基本相似的区段标记,并跨越发送重复数据删除的区域 网络从一个存储位置到另一个存储位置。

    Delta Version Clustering and Re-Anchoring
    39.
    发明申请
    Delta Version Clustering and Re-Anchoring 审中-公开
    Delta版本聚类和重新锚定

    公开(公告)号:US20140052700A1

    公开(公告)日:2014-02-20

    申请号:US13961259

    申请日:2013-08-07

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30153 G06F17/30162

    摘要: A system, a method, and a computer program product for delta version clustering and re-anchoring are provided. A first anchor having a plurality of delta-compressed versions of data dependent on the first anchor is generated. The first anchor and the plurality of delta-compressed versions form a cluster. A second anchor is generated. The first anchor is replaced with the second anchor. The replacing includes re-computing at least one delta-compressed version in the plurality of delta-compressed versions to be dependent on the second anchor. The second anchor replaces the first anchor as an anchor of the cluster.

    摘要翻译: 提供了一种用于增量版本聚类和重新锚定的系统,方法和计算机程序产品。 生成具有取决于第一锚点的数据的多个delta压缩版本的第一锚点。 第一个锚点和多个delta压缩版本组成一个集群。 生成第二个锚点。 第一个锚被第二个锚取代。 替换包括重新计算多个delta压缩版本中至少一个delta压缩版本以依赖于第二锚。 第二个锚将第一个锚替换为群集的锚点。