Data deduplication utilizing extent ID database

    公开(公告)号:US09659047B2

    公开(公告)日:2017-05-23

    申请号:US14559317

    申请日:2014-12-03

    申请人: NetApp, Inc.

    摘要: An extent map (EMAP) database may include one or more extent map entries configured to map extent IDs to PVBNs. Each extent ID may be apportioned into a most significant bit (MSB) portion, i.e., checksum bits, and a least significant bit (LSB) portion, i.e., duplicate bits. A hash may be applied to the data of the extent to calculate the checksum bits, which illustratively represent a fingerprint of the data. The duplicate bits may be configured to denote any reoccurrence of the checksum bits in the EMAP database, i.e., whether there is an existing extent with potentially identical data in a volume of the aggregate. Each extent map entry may be inserted on a node having one or more key/value pairs, wherein the key is the extent ID and the value is the PVBN. The EMAP database may be scanned and utilized to perform data deduplication.

    TWO-STAGE FRONT END FOR EXTENT MAP DATABASE
    2.
    发明申请

    公开(公告)号:US20170255624A1

    公开(公告)日:2017-09-07

    申请号:US15601388

    申请日:2017-05-22

    申请人: NetApp, Inc.

    IPC分类号: G06F17/30

    摘要: Multiple key-value stores may be employed to smooth out random updates (based on the extent ID) to the EMAP database. The updates to the EMAP database occur in a two-stage manner: (i) using an append-only log store for the first stage and (ii) using an on-disk hash store for the second stage. The append-only log store is used to convert the random updates to sequential write operations on the EMAP database. Once full, the contents of the log store are sorted and moved to the on-disk hash store, which holds the updates for a transient period of time to enable batching of the updates. Once sufficient batching of the extent map entries are accumulated, those entries are sorted and moved to the EMAP database. Thereafter, the EMAP database can be scanned to find extent map entries having identical checksum bits to perform data deduplication.

    FINGERPRINTS DATASTORE AND STALE FINGERPRINT REMOVAL IN DE-DUPLICATION ENVIRONMENTS
    3.
    发明申请
    FINGERPRINTS DATASTORE AND STALE FINGERPRINT REMOVAL IN DE-DUPLICATION ENVIRONMENTS 审中-公开
    指纹环境中的指纹数据和标志指纹去除

    公开(公告)号:US20150046409A1

    公开(公告)日:2015-02-12

    申请号:US14523773

    申请日:2014-10-24

    申请人: NetApp, Inc.

    IPC分类号: G06F17/30

    摘要: A storage server is coupled to a storage device that stores blocks of data, and generates a fingerprint for each data block stored on the storage device. The storage server creates a fingerprints datastore that is divided into a primary datastore and a secondary datastore. The primary datastore comprises a single entry for each unique fingerprint and the secondary datastore comprises an entry having an identical fingerprint as an entry in the primary datastore. The storage server merges entries in a changelog with the entries in the primary datastore to identify duplicate data blocks in the storage device and frees the identified duplicate data blocks in the storage device. The storage server stores the entries that correspond to the freed data blocks to a third datastore and overwrites the primary datastore with the entries from the merged data that correspond to the unique fingerprints to create an updated primary datastore.

    摘要翻译: 存储服务器耦合到存储数据块的存储设备,并且为存储在存储设备上的每个数据块生成指纹。 存储服务器创建一个指纹数据存储区,分为主数据存储和辅助数据存储。 主数据存储区包括每个唯一指纹的单个条目,辅助数据存储区包括具有与主数据存储区中的条目相同的指纹的条目。 存储服务器将更改日志中的条目与主数据存储中的条目合并,以识别存储设备中的重复数据块,并释放存储设备中标识的重复数据块。 存储服务器将与释放的数据块对应的条目存储到第三个数据存储,并使用与唯一指纹对应的合并数据中的条目覆盖主数据存储,以创建更新的主数据存储。

    Two-stage front end for extent map database

    公开(公告)号:US10353884B2

    公开(公告)日:2019-07-16

    申请号:US15601388

    申请日:2017-05-22

    申请人: NetApp, Inc.

    摘要: Multiple key-value stores may be employed to smooth out random updates (based on the extent ID) to the EMAP database. The updates to the EMAP database occur in a two-stage manner: (i) using an append-only log store for the first stage and (ii) using an on-disk hash store for the second stage. The append-only log store is used to convert the random updates to sequential write operations on the EMAP database. Once full, the contents of the log store are sorted and moved to the on-disk hash store, which holds the updates for a transient period of time to enable batching of the updates. Once sufficient batching of the extent map entries are accumulated, those entries are sorted and moved to the EMAP database. Thereafter, the EMAP database can be scanned to find extent map entries having identical checksum bits to perform data deduplication.

    SYSTEM AND METHOD FOR DATA DEDUPLICATION UTILIZING EXTENT ID DATABASE
    5.
    发明申请
    SYSTEM AND METHOD FOR DATA DEDUPLICATION UTILIZING EXTENT ID DATABASE 有权
    使用EXTENT ID数据库的数据分发的系统和方法

    公开(公告)号:US20160162207A1

    公开(公告)日:2016-06-09

    申请号:US14559317

    申请日:2014-12-03

    申请人: NetApp, Inc.

    IPC分类号: G06F3/06 G06F17/30

    摘要: An extent map (EMAP) database may include one or more extent map entries configured to map extent IDs to PVBNs. Each extent ID may be apportioned into a most significant bit (MSB) portion, i.e., checksum bits, and a least significant bit (LSB) portion, i.e., duplicate bits. A hash may be applied to the data of the extent to calculate the checksum bits, which illustratively represent a fingerprint of the data. The duplicate bits may be configured to denote any reoccurrence of the checksum bits in the EMAP database, i.e., whether there is an existing extent with potentially identical data in a volume of the aggregate. Each extent map entry may be inserted on a node having one or more key/value pairs, wherein the key is the extent ID and the value is the PVBN. The EMAP database may be scanned and utilized to perform data deduplication.

    摘要翻译: 扩展区映射(EMAP)数据库可以包括一个或多个扩展映射条目,被配置为将扩展区ID映射到PVBN。 每个区段ID可以分配到最高有效位(MSB)部分,即校验和位和最低有效位(LSB)部分,即重复位。 可以将哈希值应用于计算校验和位的程度的数据,其示意性地表示数据的指纹。 重复比特可以被配置为表示EMAP数据库中的校验和比特的任何再次出现,即,是否存在在聚合体积中具有潜在相同数据的现有范围。 每个扩展区映射条目可以被插入到具有一个或多个密钥/值对的节点上,其中密钥是扩展ID,并且该值是PVBN。 EMAP数据库可能被扫描并用于执行重复数据删除。

    SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR DETERMINING BLOCK CHARACTERISTICS IN A COMPUTER DATA STORAGE SYSTEM
    6.
    发明申请
    SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR DETERMINING BLOCK CHARACTERISTICS IN A COMPUTER DATA STORAGE SYSTEM 审中-公开
    用于在计算机数据存储系统中确定块特性的系统,方法和计算机程序产品

    公开(公告)号:US20140344538A1

    公开(公告)日:2014-11-20

    申请号:US13894337

    申请日:2013-05-14

    申请人: Netapp, Inc.

    IPC分类号: G06F3/06

    摘要: Systems, methods, and non-transitory machine readable media for determining block characteristics include one or more processors, a memory for storing instructions for the one or more processors, persistent storage, and a file system implemented in the persistent storage and storing data in the persistent storage using a plurality of blocks. When the stored instructions are executed by the one or more processors, the one or more processors are configured to traverse the plurality of blocks, read contents of a first block selected from the plurality of blocks, determine one or more characteristics of the first block from metadata within the block, and selectively perform or not perform a storage operation with respect to the first data block in response to determining the one or more characteristics. In some embodiments, the storage operation is a replication operation or a deduplication operation.

    摘要翻译: 用于确定块特征的系统,方法和非暂时机器可读介质包括一个或多个处理器,用于存储用于一个或多个处理器的指令的存储器,持久存储器,以及在持久存储器中实现的文件系统, 使用多个块的持久存储。 当所存储的指令由一个或多个处理器执行时,一个或多个处理器被配置成遍历多个块,从多个块中选择的第一块的读取内容,确定第一块的一个或多个特性 块内的元数据,并且响应于确定一个或多个特性而选择性地执行或不执行关于第一数据块的存储操作。 在一些实施例中,存储操作是复制操作或重复数据消除操作。

    TWO-STAGE FRONT END FOR EXTENT MAP DATABASE
    7.
    发明申请

    公开(公告)号:US20190324954A1

    公开(公告)日:2019-10-24

    申请号:US16459852

    申请日:2019-07-02

    申请人: NetApp Inc.

    摘要: Multiple key-value stores may be employed to smooth out random updates (based on the extent ID) to the EMAP database. The updates to the EMAP database occur in a two-stage manner: (i) using an append-only log store for the first stage and (ii) using an on-disk hash store for the second stage. The append-only log store is used to convert the random updates to sequential write operations on the EMAP database. Once full, the contents of the log store are sorted and moved to the on-disk hash store, which holds the updates for a transient period of time to enable batching of the updates. Once sufficient batching of the extent map entries are accumulated, those entries are sorted and moved to the EMAP database. Thereafter, the EMAP database can be scanned to find extent map entries having identical checksum bits to perform data deduplication.