ASYNCHRONOUS SEMI-INLINE DEDUPLICATION
    11.
    发明申请

    公开(公告)号:US20200081643A1

    公开(公告)日:2020-03-12

    申请号:US16683466

    申请日:2019-11-14

    Applicant: NetApp Inc.

    Abstract: Techniques are provided for asynchronous semi-inline deduplication. A multi-tiered storage arrangement comprises a first storage tier, a second storage tier, etc. An in-memory change log of data recently written to the first storage tier is evaluate to identify a fingerprint of a data block recently written to the first storage tier. A donor data store, comprising fingerprints of data blocks already stored within the first storage tier, is queried using the fingerprint. If the fingerprint is found, then deduplication is performed for the data block to create deduplicated data based upon a potential donor data block within the first storage tier. The deduplicated data is moved from the first storage tier to the second storage tier, such as in response to a determination that the deduplicated data has not been recently accessed. The deduplication is performed before cold data is moved from first storage tier to second storage tier.

    Inline deduplication
    12.
    发明授权

    公开(公告)号:US10585611B2

    公开(公告)日:2020-03-10

    申请号:US15138435

    申请日:2016-04-26

    Applicant: NetApp Inc.

    Abstract: One or more techniques and/or computing devices are provided for inline deduplication. For example, a checksum hash table and/or a block number hash table may be maintained within memory (e.g., a storage controller may maintain the hash tables in-core). The checksum hash table may be utilized for inline deduplication to identify potential donor blocks that may comprise the same data as an incoming storage operation. Data within an in-core buffer cache is eligible as potential donor blocks so that inline deduplication may be performed using data from the in-core buffer cache, which may mitigate disk access to underlying storage for which the in-core buffer cache is used for caching. The block number hash table may be used for updating or removing entries from the hash tables, such as for blocks that are no longer eligible as potential donor blocks (e.g., deleted blocks, blocks evicted from the in-core buffer cache, etc.).

    Mapping logical identifiers using multiple identifier spaces

    公开(公告)号:US10515055B2

    公开(公告)日:2019-12-24

    申请号:US14859009

    申请日:2015-09-18

    Applicant: NetApp, Inc.

    Abstract: It is determined that a first data unit is to be written to a storage device and that the first data unit is associated with a first attribute. In response to determining that the first data unit is associated with the first attribute, a first identifier is selected from a first identifier space and the first identifier is associated with the first data unit. It is determined that a second data unit is to be written to the storage device and that the second data unit is associated with the second attribute. In response to determining that the second data unit is associated with the second attribute, a second identifier is selected from a second identifier space and the second identifier is associated with the second data unit.

    Asynchronous semi-inline deduplication

    公开(公告)号:US10001942B1

    公开(公告)日:2018-06-19

    申请号:US15386544

    申请日:2016-12-21

    Applicant: NetApp Inc.

    Abstract: Techniques are provided for asynchronous semi-inline deduplication. A multi-tiered storage arrangement comprises a first storage tier, a second storage tier, etc. An in-memory change log of data recently written to the first storage tier is evaluate to identify a fingerprint of a data block recently written to the first storage tier. A donor data store, comprising fingerprints of data blocks already stored within the first storage tier, is queried using the fingerprint. If the fingerprint is found, then deduplication is performed for the data block to create deduplicated data based upon a potential donor data block within the first storage tier. The deduplicated data is moved from the first storage tier to the second storage tier, such as in response to a determination that the deduplicated data has not been recently accessed. The deduplication is performed before cold data is moved from first storage tier to second storage tier.

    INLINE DEDUPLICATION
    15.
    发明申请

    公开(公告)号:US20170308320A1

    公开(公告)日:2017-10-26

    申请号:US15138435

    申请日:2016-04-26

    Applicant: NetApp Inc.

    Abstract: One or more techniques and/or computing devices are provided for inline deduplication. For example, a checksum hash table and/or a block number hash table may be maintained within memory (e.g., a storage controller may maintain the hash tables in-core). The checksum hash table may be utilized for inline deduplication to identify potential donor blocks that may comprise the same data as an incoming storage operation. Data within an in-core buffer cache is eligible as potential donor blocks so that inline deduplication may be performed using data from the in-core buffer cache, which may mitigate disk access to underlying storage for which the in-core buffer cache is used for caching. The block number hash table may be used for updating or removing entries from the hash tables, such as for blocks that are no longer eligible as potential donor blocks (e.g., deleted blocks, blocks evicted from the in-core buffer cache, etc.).

    TWO-STAGE FRONT END FOR EXTENT MAP DATABASE
    16.
    发明申请

    公开(公告)号:US20170255624A1

    公开(公告)日:2017-09-07

    申请号:US15601388

    申请日:2017-05-22

    Applicant: NetApp, Inc.

    Abstract: Multiple key-value stores may be employed to smooth out random updates (based on the extent ID) to the EMAP database. The updates to the EMAP database occur in a two-stage manner: (i) using an append-only log store for the first stage and (ii) using an on-disk hash store for the second stage. The append-only log store is used to convert the random updates to sequential write operations on the EMAP database. Once full, the contents of the log store are sorted and moved to the on-disk hash store, which holds the updates for a transient period of time to enable batching of the updates. Once sufficient batching of the extent map entries are accumulated, those entries are sorted and moved to the EMAP database. Thereafter, the EMAP database can be scanned to find extent map entries having identical checksum bits to perform data deduplication.

    FRAGMENTATION CONTROL FOR PERFORMING DEDUPLICATION OPERATIONS
    17.
    发明申请
    FRAGMENTATION CONTROL FOR PERFORMING DEDUPLICATION OPERATIONS 审中-公开
    执行重复操作的分段控制

    公开(公告)号:US20150254268A1

    公开(公告)日:2015-09-10

    申请号:US14686426

    申请日:2015-04-14

    Applicant: NetApp, Inc.

    CPC classification number: G06F16/1748 G06F16/2272 G06F16/951

    Abstract: The techniques introduced here provide for enabling deduplication operations for a file system without significantly affecting read performance of the file system due to fragmentation of the data sets in the file system. The techniques include determining, by a storage server that hosts the file system, a level of fragmentation that would be introduced to a data set stored in the file system as a result of performing a deduplication operation on the data set. The storage server then compares the level of fragmentation with a threshold value and determines whether to perform the deduplication operation based on a result of comparing the level of fragmentation with the threshold value. The threshold value represents an acceptable level of fragmentation in the data sets of the file system.

    Abstract translation: 这里介绍的技术提供了对文件系统进行重复数据删除操作,而不会由于文件系统中的数据集的碎片而显着影响文件系统的读取性能。 这些技术包括由承载文件系统的存储服务器确定作为对该数据集执行重复数据删除操作的结果,该层级将被引入存储在文件系统中的数据集。 然后,存储服务器将分段级别与阈值进行比较,并且基于将分段级别与阈值进行比较的结果来确定是否执行重复数据消除操作。 阈值表示文件系统的数据集中可接受的碎片级别。

    FINGERPRINTS DATASTORE AND STALE FINGERPRINT REMOVAL IN DE-DUPLICATION ENVIRONMENTS
    18.
    发明申请
    FINGERPRINTS DATASTORE AND STALE FINGERPRINT REMOVAL IN DE-DUPLICATION ENVIRONMENTS 审中-公开
    指纹环境中的指纹数据和标志指纹去除

    公开(公告)号:US20150046409A1

    公开(公告)日:2015-02-12

    申请号:US14523773

    申请日:2014-10-24

    Applicant: NetApp, Inc.

    Abstract: A storage server is coupled to a storage device that stores blocks of data, and generates a fingerprint for each data block stored on the storage device. The storage server creates a fingerprints datastore that is divided into a primary datastore and a secondary datastore. The primary datastore comprises a single entry for each unique fingerprint and the secondary datastore comprises an entry having an identical fingerprint as an entry in the primary datastore. The storage server merges entries in a changelog with the entries in the primary datastore to identify duplicate data blocks in the storage device and frees the identified duplicate data blocks in the storage device. The storage server stores the entries that correspond to the freed data blocks to a third datastore and overwrites the primary datastore with the entries from the merged data that correspond to the unique fingerprints to create an updated primary datastore.

    Abstract translation: 存储服务器耦合到存储数据块的存储设备,并且为存储在存储设备上的每个数据块生成指纹。 存储服务器创建一个指纹数据存储区,分为主数据存储和辅助数据存储。 主数据存储区包括每个唯一指纹的单个条目,辅助数据存储区包括具有与主数据存储区中的条目相同的指纹的条目。 存储服务器将更改日志中的条目与主数据存储中的条目合并,以识别存储设备中的重复数据块,并释放存储设备中标识的重复数据块。 存储服务器将与释放的数据块对应的条目存储到第三个数据存储,并使用与唯一指纹对应的合并数据中的条目覆盖主数据存储,以创建更新的主数据存储。

    SYSTEM AND METHOD FOR RETAINING DEDUPLICATION IN A STORAGE OBJECT AFTER A CLONE SPLIT OPERATION
    19.
    发明申请
    SYSTEM AND METHOD FOR RETAINING DEDUPLICATION IN A STORAGE OBJECT AFTER A CLONE SPLIT OPERATION 有权
    克隆分离操作后存储对象中保留重复的系统和方法

    公开(公告)号:US20140351539A1

    公开(公告)日:2014-11-27

    申请号:US14457332

    申请日:2014-08-12

    Applicant: NETAPP, INC.

    Abstract: Described herein is a system and method for retaining deduplication of data blocks of a resulting storage object (e.g., a flexible volume) from a split operation of a clone of a base storage object. The clone may comprise data blocks that are shared with at least one data block of the base storage object and at least one data block that is not shared with at least one data block of the base storage object. The data blocks of the clone that are shared with the base storage object may be indicated to receive a write allocation that may comprise assigning a new pointer to a indicated data block. Each data block may comprise a plurality of pointers comprising a virtual address pointer and a physical address pointer. As such, data blocks of the clone comprising the same virtual address pointer may be assigned a single physical address pointer. Thus, a new physical address pointer is assigned or allocated once to a given virtual address pointer of data blocks of a clone.

    Abstract translation: 这里描述了一种用于从基本存储对象的克隆的分离操作中保留所得到的存储对象(例如,灵活卷)的数据块的重复数据删除的系统和方法。 克隆可以包括与基本存储对象的至少一个数据块共享的数据块和不与基本存储对象的至少一个数据块共享的至少一个数据块。 可以指示与基本存储对象共享的克隆的数据块以接收可以包括向指示的数据块分配新指针的写入分配。 每个数据块可以包括包括虚拟地址指针和物理地址指针的多个指针。 因此,可以为包括相同虚拟地址指针的克隆的数据块分配单个物理地址指针。 因此,将新的物理地址指针分配或分配给克隆的数据块的给定虚拟地址指针。

    TWO-STAGE FRONT END FOR EXTENT MAP DATABASE
    20.
    发明申请

    公开(公告)号:US20190324954A1

    公开(公告)日:2019-10-24

    申请号:US16459852

    申请日:2019-07-02

    Applicant: NetApp Inc.

    Abstract: Multiple key-value stores may be employed to smooth out random updates (based on the extent ID) to the EMAP database. The updates to the EMAP database occur in a two-stage manner: (i) using an append-only log store for the first stage and (ii) using an on-disk hash store for the second stage. The append-only log store is used to convert the random updates to sequential write operations on the EMAP database. Once full, the contents of the log store are sorted and moved to the on-disk hash store, which holds the updates for a transient period of time to enable batching of the updates. Once sufficient batching of the extent map entries are accumulated, those entries are sorted and moved to the EMAP database. Thereafter, the EMAP database can be scanned to find extent map entries having identical checksum bits to perform data deduplication.

Patent Agency Ranking