Asynchronous semi-inline deduplication

    公开(公告)号:US11068182B2

    公开(公告)日:2021-07-20

    申请号:US16683466

    申请日:2019-11-14

    申请人: NetApp inc.

    IPC分类号: G06F3/00 G06F3/06

    摘要: Techniques are provided for asynchronous semi-inline deduplication. A multi-tiered storage arrangement comprises a first storage tier, a second storage tier, etc. An in-memory change log of data recently written to the first storage tier is evaluate to identify a fingerprint of a data block recently written to the first storage tier. A donor data store, comprising fingerprints of data blocks already stored within the first storage tier, is queried using the fingerprint. If the fingerprint is found, then deduplication is performed for the data block to create deduplicated data based upon a potential donor data block within the first storage tier. The deduplicated data is moved from the first storage tier to the second storage tier, such as in response to a determination that the deduplicated data has not been recently accessed. The deduplication is performed before cold data is moved from first storage tier to second storage tier.

    INLINE DEDUPLICATION
    2.
    发明申请

    公开(公告)号:US20200159432A1

    公开(公告)日:2020-05-21

    申请号:US16774127

    申请日:2020-01-28

    申请人: NetApp Inc.

    IPC分类号: G06F3/06

    摘要: One or more techniques and/or computing devices are provided for inline deduplication. For example, a checksum hash table and/or a block number hash table may be maintained within memory (e.g., a storage controller may maintain the hash tables in-core). The checksum hash table may be utilized for inline deduplication to identify potential donor blocks that may comprise the same data as an incoming storage operation. Data within an in-core buffer cache is eligible as potential donor blocks so that inline deduplication may be performed using data from the in-core buffer cache, which may mitigate disk access to underlying storage for which the in-core buffer cache is used for caching. The block number hash table may be used for updating or removing entries from the hash tables, such as for blocks that are no longer eligible as potential donor blocks (e.g., deleted blocks, blocks evicted from the in-core buffer cache, etc.).

    Two-stage front end for extent map database

    公开(公告)号:US10353884B2

    公开(公告)日:2019-07-16

    申请号:US15601388

    申请日:2017-05-22

    申请人: NetApp, Inc.

    摘要: Multiple key-value stores may be employed to smooth out random updates (based on the extent ID) to the EMAP database. The updates to the EMAP database occur in a two-stage manner: (i) using an append-only log store for the first stage and (ii) using an on-disk hash store for the second stage. The append-only log store is used to convert the random updates to sequential write operations on the EMAP database. Once full, the contents of the log store are sorted and moved to the on-disk hash store, which holds the updates for a transient period of time to enable batching of the updates. Once sufficient batching of the extent map entries are accumulated, those entries are sorted and moved to the EMAP database. Thereafter, the EMAP database can be scanned to find extent map entries having identical checksum bits to perform data deduplication.

    SYSTEM AND METHOD FOR DATA DEDUPLICATION UTILIZING EXTENT ID DATABASE
    4.
    发明申请
    SYSTEM AND METHOD FOR DATA DEDUPLICATION UTILIZING EXTENT ID DATABASE 有权
    使用EXTENT ID数据库的数据分发的系统和方法

    公开(公告)号:US20160162207A1

    公开(公告)日:2016-06-09

    申请号:US14559317

    申请日:2014-12-03

    申请人: NetApp, Inc.

    IPC分类号: G06F3/06 G06F17/30

    摘要: An extent map (EMAP) database may include one or more extent map entries configured to map extent IDs to PVBNs. Each extent ID may be apportioned into a most significant bit (MSB) portion, i.e., checksum bits, and a least significant bit (LSB) portion, i.e., duplicate bits. A hash may be applied to the data of the extent to calculate the checksum bits, which illustratively represent a fingerprint of the data. The duplicate bits may be configured to denote any reoccurrence of the checksum bits in the EMAP database, i.e., whether there is an existing extent with potentially identical data in a volume of the aggregate. Each extent map entry may be inserted on a node having one or more key/value pairs, wherein the key is the extent ID and the value is the PVBN. The EMAP database may be scanned and utilized to perform data deduplication.

    摘要翻译: 扩展区映射(EMAP)数据库可以包括一个或多个扩展映射条目,被配置为将扩展区ID映射到PVBN。 每个区段ID可以分配到最高有效位(MSB)部分,即校验和位和最低有效位(LSB)部分,即重复位。 可以将哈希值应用于计算校验和位的程度的数据,其示意性地表示数据的指纹。 重复比特可以被配置为表示EMAP数据库中的校验和比特的任何再次出现,即,是否存在在聚合体积中具有潜在相同数据的现有范围。 每个扩展区映射条目可以被插入到具有一个或多个密钥/值对的节点上,其中密钥是扩展ID,并且该值是PVBN。 EMAP数据库可能被扫描并用于执行重复数据删除。

    ASYNCHRONOUS SEMI-INLINE DEDUPLICATION

    公开(公告)号:US20210342082A1

    公开(公告)日:2021-11-04

    申请号:US17373820

    申请日:2021-07-13

    申请人: NetApp Inc.

    IPC分类号: G06F3/06

    摘要: Techniques are provided for asynchronous semi-inline deduplication. A multi-tiered storage arrangement comprises a first storage tier, a second storage tier, etc. An in-memory change log of data recently written to the first storage tier is evaluate to identify a fingerprint of a data block recently written to the first storage tier. A donor data store, comprising fingerprints of data blocks already stored within the first storage tier, is queried using the fingerprint. If the fingerprint is found, then deduplication is performed for the data block to create deduplicated data based upon a potential donor data block within the first storage tier. The deduplicated data is moved from the first storage tier to the second storage tier, such as in response to a determination that the deduplicated data has not been recently accessed. The deduplication is performed before cold data is moved from first storage tier to second storage tier.

    Inline deduplication
    6.
    发明授权

    公开(公告)号:US11010078B2

    公开(公告)日:2021-05-18

    申请号:US16774127

    申请日:2020-01-28

    申请人: NetApp Inc.

    IPC分类号: G06F3/06

    摘要: One or more techniques and/or computing devices are provided for inline deduplication. For example, a checksum hash table and/or a block number hash table may be maintained within memory (e.g., a storage controller may maintain the hash tables in-core). The checksum hash table may be utilized for inline deduplication to identify potential donor blocks that may comprise the same data as an incoming storage operation. Data within an in-core buffer cache is eligible as potential donor blocks so that inline deduplication may be performed using data from the in-core buffer cache, which may mitigate disk access to underlying storage for which the in-core buffer cache is used for caching. The block number hash table may be used for updating or removing entries from the hash tables, such as for blocks that are no longer eligible as potential donor blocks (e.g., deleted blocks, blocks evicted from the in-core buffer cache, etc.).

    SYSTEM AND METHOD FOR RETAINING DEDUPLICATION IN A STORAGE OBJECT AFTER A CLONE SPLIT OPERATION
    7.
    发明申请
    SYSTEM AND METHOD FOR RETAINING DEDUPLICATION IN A STORAGE OBJECT AFTER A CLONE SPLIT OPERATION 审中-公开
    克隆分离操作后存储对象中保留重复的系统和方法

    公开(公告)号:US20160077756A1

    公开(公告)日:2016-03-17

    申请号:US14952947

    申请日:2015-11-26

    申请人: NetApp Inc.

    发明人: Bipul Raj Alok Sharma

    IPC分类号: G06F3/06

    摘要: Described herein is a system and method for retaining deduplication of data blocks of a resulting storage object (e.g., a flexible volume) from a split operation of a clone of a base storage object. The clone may comprise data blocks that are shared with at least one data block of the base storage object and at least one data block that is not shared with at least one data block of the base storage object. The data blocks of the clone that are shared with the base storage object may be indicated to receive a write allocation that may comprise assigning a new pointer to an indicated data block. Each data block may comprise a plurality of pointers comprising a virtual address pointer and a physical address pointer. As such, data blocks of the clone comprising the same virtual address pointer may be assigned a single physical address pointer. Thus, a new physical address pointer is assigned or allocated once to a given virtual address pointer of data blocks of a clone.

    摘要翻译: 这里描述了一种用于从基本存储对象的克隆的分离操作中保留所得到的存储对象(例如,灵活卷)的数据块的重复数据删除的系统和方法。 克隆可以包括与基本存储对象的至少一个数据块共享的数据块和不与基本存储对象的至少一个数据块共享的至少一个数据块。 可以指示与基本存储对象共享的克隆的数据块以接收可以包括向指示的数据块分配新指针的写入分配。 每个数据块可以包括包括虚拟地址指针和物理地址指针的多个指针。 因此,可以为包括相同虚拟地址指针的克隆的数据块分配单个物理地址指针。 因此,将新的物理地址指针分配或分配给克隆的数据块的给定虚拟地址指针。

    Asynchronous semi-inline deduplication

    公开(公告)号:US11620064B2

    公开(公告)日:2023-04-04

    申请号:US17373820

    申请日:2021-07-13

    申请人: NetApp Inc.

    IPC分类号: G06F3/00 G06F3/06

    摘要: Techniques are provided for asynchronous semi-inline deduplication. A multi-tiered storage arrangement comprises a first storage tier, a second storage tier, etc. An in-memory change log of data recently written to the first storage tier is evaluate to identify a fingerprint of a data block recently written to the first storage tier. A donor data store, comprising fingerprints of data blocks already stored within the first storage tier, is queried using the fingerprint. If the fingerprint is found, then deduplication is performed for the data block to create deduplicated data based upon a potential donor data block within the first storage tier. The deduplicated data is moved from the first storage tier to the second storage tier, such as in response to a determination that the deduplicated data has not been recently accessed. The deduplication is performed before cold data is moved from first storage tier to second storage tier.

    ASYNCHRONOUS SEMI-INLINE DEDUPLICATION
    9.
    发明申请

    公开(公告)号:US20180173449A1

    公开(公告)日:2018-06-21

    申请号:US15386544

    申请日:2016-12-21

    申请人: NetApp Inc.

    IPC分类号: G06F3/06

    摘要: Techniques are provided for asynchronous semi-inline deduplication. A multi-tiered storage arrangement comprises a first storage tier, a second storage tier, etc. An in-memory change log of data recently written to the first storage tier is evaluate to identify a fingerprint of a data block recently written to the first storage tier. A donor data store, comprising fingerprints of data blocks already stored within the first storage tier, is queried using the fingerprint. If the fingerprint is found, then deduplication is performed for the data block to create deduplicated data based upon a potential donor data block within the first storage tier. The deduplicated data is moved from the first storage tier to the second storage tier, such as in response to a determination that the deduplicated data has not been recently accessed. The deduplication is performed before cold data is moved from first storage tier to second storage tier.

    Data deduplication utilizing extent ID database

    公开(公告)号:US09659047B2

    公开(公告)日:2017-05-23

    申请号:US14559317

    申请日:2014-12-03

    申请人: NetApp, Inc.

    摘要: An extent map (EMAP) database may include one or more extent map entries configured to map extent IDs to PVBNs. Each extent ID may be apportioned into a most significant bit (MSB) portion, i.e., checksum bits, and a least significant bit (LSB) portion, i.e., duplicate bits. A hash may be applied to the data of the extent to calculate the checksum bits, which illustratively represent a fingerprint of the data. The duplicate bits may be configured to denote any reoccurrence of the checksum bits in the EMAP database, i.e., whether there is an existing extent with potentially identical data in a volume of the aggregate. Each extent map entry may be inserted on a node having one or more key/value pairs, wherein the key is the extent ID and the value is the PVBN. The EMAP database may be scanned and utilized to perform data deduplication.