TRANSACTION LOG LAYOUT FOR EFFICIENT RECLAMATION AND RECOVERY

    公开(公告)号:US20170097771A1

    公开(公告)日:2017-04-06

    申请号:US14872793

    申请日:2015-10-01

    Applicant: NetApp, Inc.

    Abstract: A layout of a transaction log enables efficient logging of metadata into entries of the log, as well as efficient reclamation and recovery of the log entries by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The transaction log is illustratively a two stage, append-only logging structure, wherein the first level is non-volatile random access memory (NVRAM) embodied as a NV log and the second stage is disk, e.g., solid state drive (SSD). The layout of the logging structure facilitates steady-state logging of metadata managed by the volume layer and crash recovery. Steady-state logging of metadata into the log entries occurs while the storage I/O stack of a node actively processes I/O requests, while crash recovery of the log entries occurs after an unexpected shutdown of the node.

    LOW-OVERHEAD RESTARTABLE MERGE OPERATION WITH EFFICIENT CRASH RECOVERY
    2.
    发明申请
    LOW-OVERHEAD RESTARTABLE MERGE OPERATION WITH EFFICIENT CRASH RECOVERY 审中-公开
    具有高效冲击恢复功能的低过载重启功能

    公开(公告)号:US20160070714A1

    公开(公告)日:2016-03-10

    申请号:US14483012

    申请日:2014-09-10

    Applicant: NetApp, Inc.

    CPC classification number: G06F16/1748 G06F11/1471 G06F16/2246

    Abstract: A low-overhead merge technique enables restart of a merge operation with minimal logging of state information relating to progress of the merge operation by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The technique enables restart of the merge operation by ensuring that metadata, i.e., metadata pages, generated during the merge operation is not subject to de-duplication by providing a unique value in each metadata page that distinguishes the page, i.e., renders the page distinct or “unique”, from other metadata pages in an extent store. In addition, the technique ensures that a reference count on each metadata page is a value denoting a lack of de-duplication. To that end, the extent store layer is configured to not increment the reference count for a metadata page if, during the merge operation, the page is identical (and thus subject to deduplication) to an existing metadata page in the extent store.

    Abstract translation: 低开销合并技术使得可以通过对在集群的一个或多个节点上执行的存储输入/输出(I / O)堆栈的卷层进行合并操作的进展的状态信息的最小记录来重新启动合并操作。 。 该技术通过确保在合并操作期间生成的元数据页面不受重复数据删除的影响,从而通过在每个元数据页面中提供唯一的值来区分页面,即,使页面不同 或“唯一”,从范围存储中的其他元数据页面。 此外,该技术确保每个元数据页面上的引用计数是表示缺少重复数据删除的值。 为此,如果在合并操作期间页面与扩展存储区中的现有元数据页面相同(因此遭受重复数据删除),则扩展区存储层被配置为不递增元数据页面的引用计数。

    BOTTOM-UP DENSE TREE REPAIR TECHNIQUE
    3.
    发明申请

    公开(公告)号:US20170212919A1

    公开(公告)日:2017-07-27

    申请号:US15005593

    申请日:2016-01-25

    Applicant: NetApp, Inc.

    CPC classification number: G06F16/2246 G06F13/1668 G06F13/4068 G06F16/2379

    Abstract: A bottom-up technique repairs a data structure, e.g., a multi-level dense tree, used to organize volume metadata as metadata entries managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The bottom-up repair technique implements a progressive repair algorithm that initially involves traversing each level of the dense tree to determine consistency of metadata entries by ensuring that the entries, e.g., (i) monotonically increase, (ii) do not overlap and (iii), if appropriate, reference (point to) existing entries of a lower level. The technique detects and corrects inconsistencies by, e.g., deleting out-of-order and overlapping entries, and adjusting the range of an index entry to reference the corresponding lower level entry. The technique then examines whether metadata entries at a lower level of the tree are referenced (pointed to) by corresponding index entries in an upper (parent) level. If there is no index entry at the upper level pointing to a lower level entry (i.e., a gap in offset range), the upper level is fixed (repaired) by employing a gap analysis procedure.

    EXACTLY ONCE SEMANTICS
    4.
    发明申请
    EXACTLY ONCE SEMANTICS 审中-公开
    完整的语义

    公开(公告)号:US20160246522A1

    公开(公告)日:2016-08-25

    申请号:US14631408

    申请日:2015-02-25

    Applicant: NetApp, Inc.

    Abstract: An exactly once semantics (EOS) system of a storage input/output (I/O) stack implements a technique ensuring that non-idempotent operations occur exactly once in a storage system embodied as a node of a cluster. Illustratively, a first layer of the storage I/O stack may act as a client issuing a non-idempotent operation to second layer of the stack, which may act as a server. According to the technique, the EOS system may wrap (i.e., encapsulate) the non-idempotent operation within a transaction embodied as an EOS transaction data structure having a transaction identifier that uniquely identifies the transaction. The server may complete the transaction and reply with a result to the client, which may acknowledge receipt of the reply. In response to a crash and subsequent recovery of the node, the EOS system may determine whether the transaction had completed prior to the crash. If so, the EOS system ensures that the transaction is not re-played (re-executed). Otherwise, the EOS system allows execution of the transaction such that the transaction occurs exactly once.

    Abstract translation: 一个存储输入/输出(I / O)堆栈的完全一次语义(EOS)系统实现了一种技术,确保非特权操作在体现为集群节点的存储系统中发生一次。 示例性地,存储I / O堆栈的第一层可以充当向堆叠的第二层发出非幂等操作的客户端,其可以充当服务器。 根据该技术,EOS系统可以将具有具有唯一地识别交易的事务标识符的EOS事务数据结构体现的事务中的非幂等操作包裹(即封装)。 服务器可以完成交易并将结果回复给客户端,这可以确认收到回复。 响应于节点的崩溃和随后的恢复,EOS系统可以确定事务在崩溃之前是否已经完成。 如果是这样,EOS系统确保事务不被重新播放(重新执行)。 否则,EOS系统允许执行事务,使得事务正好发生一次。

    DEFERRED REFERENCE COUNT UPDATE TECHNIQUE FOR LOW OVERHEAD VOLUME METADATA
    6.
    发明申请
    DEFERRED REFERENCE COUNT UPDATE TECHNIQUE FOR LOW OVERHEAD VOLUME METADATA 审中-公开
    用于低超大容量元数据的延迟参考计数更新技术

    公开(公告)号:US20160077744A1

    公开(公告)日:2016-03-17

    申请号:US14484061

    申请日:2014-09-11

    Applicant: NETAPP, INC.

    Abstract: A deferred refcount update technique efficiently frees storage space for metadata (associated with data) to be deleted during a merge operation managed by a volume layer of a node. The metadata is illustratively volume metadata embodied as mappings from logical block addresses (LBAs) of a logical unit (LUN) to extent keys maintained by an extent store layer of the node. One or more requests to delete (or overwrite) an LBA range within a LUN may be captured as page keys associated with metadata pages during the merge operation and the storage space associated with those metadata pages may be freed in an out-of-band fashion. The page keys of the metadata pages may be persistently recorded in a reference count (refcount) log to thereby allow the merge operation to complete without resolving deletion of the keys. A batch of page keys may be organized as one or more delete requests and, once the merge completes, the keys may be inserted into the refcount log. Subsequently, a deferred reference count update process may be spawned (instantiated) to walk through the page keys stored in the refcount log and delete each key, e.g., from the extent store layer, independently and out-of-band from the merge operation.

    Abstract translation: 延迟重新计费更新技术有效地释放了在由节点的卷层管理的合并操作期间要删除的元数据(与数据相关联)的存储空间。 元数据示例性地是体现为从逻辑单元(LUN)的逻辑块地址(LBA)到由节点的扩展区存储层维护的扩展密钥的映射的卷元数据。 删除(或覆盖)LUN中的LBA范围的一个或多个请求可以被捕获为在合并操作期间与元数据页相关联的页面键,并且与那些元数据页相关联的存储空间可以以带外方式释放 。 元数据页面的页面键可以被持久地记录在引用计数(引用计数)日志中,从而允许合并操作完成而不解决键的删除。 一批页面键可以被组织为一个或多个删除请求,并且一旦合并完成,则可以将密钥插入到引用计数日志中。 随后,可以产生(实例化)延迟引用计数更新处理以遍历存储在引用计数日志中的页面密钥,并且例如从扩展存储层中删除每个密钥,从合并操作中独立地进行带外删除。

    Consistency checker for global de-duplication clustered file system

    公开(公告)号:US10049118B2

    公开(公告)日:2018-08-14

    申请号:US14727005

    申请日:2015-06-01

    Applicant: NetApp, Inc.

    Abstract: A cluster-wide consistency checker ensures that two file systems of a storage input/output (I/O) stack executing on each node of a cluster are self-consistent as well as consistent with respect to each other. The file systems include a deduplication file system and a host-facing file system that cooperate to provide a layered file system of the storage I/O stack. The deduplication file system is a log-structured file system managed by an extent store layer of the storage I/O stack, whereas the host-facing file system is managed by a volume layer of the stack. Illustratively, each log-structured file system implements a key-value store and cooperates with other nodes of the cluster to provide a cluster-wide (global) key-value store. The consistency checker verifies and/or fixes on-disk structures of the layered file system to ensure its consistency. To that end, the consistency checker may determine whether there are inconsistencies in the key-value store and, if so, reconciles those inconsistencies from a client (volume layer) perspective.

    Recovery from low space condition of an extent store

    公开(公告)号:US09846539B2

    公开(公告)日:2017-12-19

    申请号:US15004101

    申请日:2016-01-22

    Applicant: NetApp, Inc.

    Abstract: A technique recovers from a low space condition associated with storage space reserved in an extent store to accommodate write requests received from a host and associated metadata managed by a layered file system of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The write requests, including user data, are persistently recorded on non-volatile random access memory (NVRAM) prior to returning an acknowledgement to the host by a persistence layer of the storage I/O stack. Volume metadata managed by a volume layer of the layered file system is embodied as mappings from logical block addresses (LBAs) of a logical unit (LUN) accessible by the host to extent keys maintained by an extent store layer of the layered file system. Extent store metadata managed by the extent store layer is embodied as mappings from the extent keys to the storage locations of the extents on storage devices of storage arrays coupled to the nodes of the cluster. The space recovery technique accounts for storage space consumed in the extent store by user operations, i.e., write operations for the user data stored on the NVRAM at the persistence layer as well as the associated volume and extent store metadata, to ensure that the user data and associated metadata can be safely and reliably persisted in the extent store even during a low space condition.

    TRANSACTION LOG LAYOUT FOR EFFICIENT RECLAMATION AND RECOVERY

    公开(公告)号:US20170097873A1

    公开(公告)日:2017-04-06

    申请号:US14876572

    申请日:2015-10-06

    Applicant: NetApp, Inc.

    Abstract: A layout of a transaction log enables efficient logging of metadata into entries of the log, as well as efficient reclamation and recovery of the log entries by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The transaction log is illustratively a two stage, append-only logging structure, wherein the first level is non-volatile random access memory (NVRAM) embodied as a NVlog and the second stage is disk, e.g., solid state drive (SSD). During crash recovery, the log entries are examined for consistency and scanned to identify those entries that have completed and those that are active, which require replay. The log entries are walked from oldest to newest (using sequence numbers) searching for the highest sequence number. Partially complete log entries (e.g., log entries in-progress when a crash occurs) may be discarded for failing a checksum (e.g., a CRC error). Old value/new value logs may be used to implement roll-forward or roll-back semantics to replay the log entries and fix any on-disk data structures, first from NVRAM and then from on-disk logs.

    Transaction log layout for efficient reclamation and recovery

    公开(公告)号:US09952765B2

    公开(公告)日:2018-04-24

    申请号:US14876572

    申请日:2015-10-06

    Applicant: NetApp, Inc.

    Abstract: A layout of a transaction log enables efficient logging of metadata into entries of the log, as well as efficient reclamation and recovery of the log entries by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The transaction log is illustratively a two stage, append-only logging structure, wherein the first level is non-volatile random access memory (NVRAM) embodied as a NVlog and the second stage is disk, e.g., solid state drive (SSD). During crash recovery, the log entries are examined for consistency and scanned to identify those entries that have completed and those that are active, which require replay. The log entries are walked from oldest to newest (using sequence numbers) searching for the highest sequence number. Partially complete log entries (e.g., log entries in-progress when a crash occurs) may be discarded for failing a checksum (e.g., a CRC error). Old value/new value logs may be used to implement roll-forward or roll-back semantics to replay the log entries and fix any on-disk data structures, first from NVRAM and then from on-disk logs.

Patent Agency Ranking