FINGERPRINTS DATASTORE AND STALE FINGERPRINT REMOVAL IN DE-DUPLICATION ENVIRONMENTS
    1.
    发明申请
    FINGERPRINTS DATASTORE AND STALE FINGERPRINT REMOVAL IN DE-DUPLICATION ENVIRONMENTS 审中-公开
    指纹环境中的指纹数据和标志指纹去除

    公开(公告)号:US20150046409A1

    公开(公告)日:2015-02-12

    申请号:US14523773

    申请日:2014-10-24

    Applicant: NetApp, Inc.

    Abstract: A storage server is coupled to a storage device that stores blocks of data, and generates a fingerprint for each data block stored on the storage device. The storage server creates a fingerprints datastore that is divided into a primary datastore and a secondary datastore. The primary datastore comprises a single entry for each unique fingerprint and the secondary datastore comprises an entry having an identical fingerprint as an entry in the primary datastore. The storage server merges entries in a changelog with the entries in the primary datastore to identify duplicate data blocks in the storage device and frees the identified duplicate data blocks in the storage device. The storage server stores the entries that correspond to the freed data blocks to a third datastore and overwrites the primary datastore with the entries from the merged data that correspond to the unique fingerprints to create an updated primary datastore.

    Abstract translation: 存储服务器耦合到存储数据块的存储设备,并且为存储在存储设备上的每个数据块生成指纹。 存储服务器创建一个指纹数据存储区,分为主数据存储和辅助数据存储。 主数据存储区包括每个唯一指纹的单个条目,辅助数据存储区包括具有与主数据存储区中的条目相同的指纹的条目。 存储服务器将更改日志中的条目与主数据存储中的条目合并,以识别存储设备中的重复数据块,并释放存储设备中标识的重复数据块。 存储服务器将与释放的数据块对应的条目存储到第三个数据存储,并使用与唯一指纹对应的合并数据中的条目覆盖主数据存储,以创建更新的主数据存储。

    SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS PROVIDING CHANGE LOGGING IN A DEDUPLICATION PROCESS
    3.
    发明申请
    SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS PROVIDING CHANGE LOGGING IN A DEDUPLICATION PROCESS 有权
    系统,方法和计算机程序产品提供更改记录过程

    公开(公告)号:US20150026424A1

    公开(公告)日:2015-01-22

    申请号:US14509892

    申请日:2014-10-08

    Applicant: NetApp, Inc.

    CPC classification number: G06F3/0641 G06F3/0619 G06F3/067 G06F11/1453

    Abstract: A method performed in a network storage system, the method including receiving a plurality of data blocks at a secondary storage subsystem from a primary storage subsystem, generating a first log that includes a first plurality of entries, one entry for each of the data blocks, in which each entry of the first plurality of entries includes a name for a respective data block and a fingerprint of the respective data block, receiving metadata at the secondary storage subsystem from the primary storage subsystem, the metadata describing relationships between the plurality of blocks and a plurality of files, generating a second log that includes a second plurality of entries, and merging the first log with the second log to generate a change log.

    Abstract translation: 一种在网络存储系统中执行的方法,所述方法包括从主存储子系统在次存储子系统处接收多个数据块,生成包括第一多个条目的第一日志,每个数据块的一个条目, 其中所述第一多个条目的每个条目包括相应数据块的名称和相应数据块的指纹,从所述主存储子系统接收所述辅助存储子系统的元数据,所述元数据描述所述多个块之间的关系和 多个文件,生成包括第二多个条目的第二日志,以及将第一日志与第二日志合并以生成改变日志。

    Removing overlapping ranges from a flat sorted data structure

    公开(公告)号:US09720928B2

    公开(公告)日:2017-08-01

    申请号:US14518403

    申请日:2014-10-20

    Applicant: NetApp, Inc.

    Abstract: A system can efficiently removes ranges of entries from a flat sorted data structure that represent stale fingerprints As part of fingerprint verification during deduplication, the system performs an attributes intersect range calculation (AIRC) procedure on the stale fingerprint data structure to compute a set of non-overlapping and latest consistency point (CP) ranges. During the AIRC procedure, an inode associated with a data container is selected and the FBN tuple of each deleted data block in the file is sorted in a predefined FBN order. The AIRC procedure then identifies the most recent fingerprint associated with a deleted data block. The set of non-overlapping and latest CP ranges is then used to remove stale fingerprints associated with that deleted block from the fingerprint database. A single pass through the fingerprint database identifies the set of non-overlapping and latest CP ranges, thereby improving efficiency of the storage system.

    Techniques for using a bloom filter in a duplication operation
    5.
    发明授权
    Techniques for using a bloom filter in a duplication operation 有权
    在复制操作中使用布隆过滤器的技术

    公开(公告)号:US09298726B1

    公开(公告)日:2016-03-29

    申请号:US13632892

    申请日:2012-10-01

    Applicant: NetApp, Inc.

    CPC classification number: G06F17/30159 G06F3/0608 G06F3/0641 G06F3/067

    Abstract: Techniques for using a bloom filter in deduplication are described herein. A change log comprising a plurality of data blocks may be received. Values associated with the data blocks may be hashed and compared with a bloom filter. The comparison with the bloom filter identifies data blocks from the change log as unique data blocks or potential duplicate data blocks. A bit by bit comparison of the potential duplicate data blocks and previous data blocks may be performed to determine if any of the potential duplicate data blocks are identical to any of previous data blocks. Such data blocks of the change log that are identified as being identical may be deduplicated.

    Abstract translation: 本文描述了在重复数据删除中使用布隆过滤器的技术。 可以接收包括多个数据块的改变日志。 与数据块相关联的值可以被散列并与布隆过滤器进行比较。 与bloom过滤器的比较将变更日志中的数据块标识为唯一的数据块或潜在的重复数据块。 可以执行潜在的重复数据块和先前数据块的逐位比较,以确定任何潜在的重复数据块是否与先前数据块中的任何数据块相同。 识别为相同的更改日志的这些数据块可以被重复数据删除。

    SYSTEM AND METHOD FOR REMOVING OVERLAPPING RANGES FROM A FLAT SORTED DATA STRUCTURE
    6.
    发明申请
    SYSTEM AND METHOD FOR REMOVING OVERLAPPING RANGES FROM A FLAT SORTED DATA STRUCTURE 有权
    从平坦的数据结构中移除重叠范围的系统和方法

    公开(公告)号:US20150039572A1

    公开(公告)日:2015-02-05

    申请号:US14518403

    申请日:2014-10-20

    Applicant: NetApp, Inc.

    Abstract: A system and method efficiently removes ranges of entries from a flat sorted data structure, such as a fingerprint database, of a storage system. The ranges of entries represent fingerprints that have become stale, i.e., are not representative of current states of corresponding blocks in the file system, due to various file system operations such as, e.g., deletion of a data block without overwriting its contents. A deduplication module of a file system executing on the storage system performs a fingerprint verification procedure to remove the stale fingerprints from the fingerprint database. As part of the fingerprint verification procedure, the deduplication module performs an attributes intersect range calculation (AIRC) procedure on the stale fingerprint data structure to compute a set of non-overlapping and latest consistency point (CP) ranges. During the AIRC procedure, an inode associated with a data container, e.g., a file, is selected and the FBN tuple of each deleted data block in the file is sorted in a predefined, e.g., increasing, FBN order. The AIRC procedure then identifies the most recent fingerprint associated with a deleted data block. The output from the AIRC procedure, i.e., the set of non-overlapping and latest CP ranges, is then used to remove stale fingerprints associated with that deleted block (as well as each other deleted data block) from the fingerprint database. Notably, only a single pass through the fingerprint database is required to identify the set of non-overlapping and latest CP ranges, thereby improving efficiency of the storage system.

    Abstract translation: 系统和方法有效地从存储系统的平坦排序的数据结构(诸如指纹数据库)中去除条目的范围。 条目的范围表示由于各种文件系统操作,例如删除数据块而不覆盖其内容,已经变得陈旧的指纹,即不代表文件系统中对应的块的当前状态。 在存储系统上执行的文件系统的重复数据删除模块执行指纹验证过程以从指纹数据库中移除陈旧的指纹。 作为指纹验证过程的一部分,重复数据消除模块在陈旧的指纹数据结构上执行属性相交范围计算(AIRC)过程,以计算一组非重叠和最新的一致性点(CP)范围。 在AIRC过程期间,选择与数据容器(例如文件)相关联的inode,并且以预定义的例如增加的FBN顺序对文件中每个被删除的数据块的FBN元组进行排序。 然后,AIRC过程识别与删除的数据块相关联的最新指纹。 然后使用来自AIRC过程的输出,即一组非重叠和最新的CP范围,以从指纹数据库中删除与该删除的块(以及每个其他已删除的数据块)相关联的陈旧的指纹。 值得注意的是,仅需要一次通过指纹数据库来识别一组不重叠和最新的CP范围,从而提高存储系统的效率。

Patent Agency Ranking