TECHNIQUE FOR PACING AND BALANCING PROCESSING OF INTERNAL AND EXTERNAL I/O REQUESTS IN A STORAGE SYSTEM

    公开(公告)号:US20170315740A1

    公开(公告)日:2017-11-02

    申请号:US15143324

    申请日:2016-04-29

    Applicant: NetApp, Inc.

    Abstract: A technique paces and balances a flow of messages related to processing of input/output (I/O) requests between subsystems, such as layers of a storage input/output (I/O) stack, of one or more nodes of a cluster. The I/O requests may be directed to externally-generated user data, e.g., write requests generated by a host coupled to the cluster, and internally-generated metadata, e.g., write and delete requests generated by a volume layer of the storage I/O stack. The user data (and metadata) may be organized as an arbitrary number of variable-length extents of one or more host-visible logical units (LUNs) served by the nodes. The metadata may include mappings from host-visible logical block address ranges (i.e., offset ranges) of a LUN to extent keys, which reference locations of the extents stored on storage devices, such as solid state drivers (SSDs), of a storage array coupled to the nodes. The I/O requests are received at a pacer of the volume layer configured to control delivery of the requests to an extent store layer of the storage I/O stack in a policy-dictated manner to enable processing and sequential storage of the user data and metadata on the SSDs of the storage array.

    Snapshot creation workflow
    12.
    发明授权

    公开(公告)号:US09740566B2

    公开(公告)日:2017-08-22

    申请号:US14869340

    申请日:2015-09-29

    Applicant: NetApp, Inc.

    Abstract: A technique efficiently creates a snapshot for a logical unit (LUN) served by a storage input/output (I/O) stack executing on a node of a cluster that organizes data as extents referenced by keys. In addition, the technique efficiently creates one or more snapshots for a group of LUNs organized as a consistency group (CG) and served by storage I/O stacks executing on a plurality of nodes of the cluster. To that end, the technique involves a plurality of indivisible operations (i.e., transactions) of a snapshot creation workflow administered by a Storage Area Network (SAN) administration layer (SAL) of the storage I/O stack in response to a snapshot create request issued by a host. The SAL administers the snapshot creation workflow by initiating a set of transactions that includes, inter alia, (i) installation of barriers for LUNs (volumes) across all nodes in the cluster that participate in snapshot creation, (ii) creation of point-in-time (PIT) markers to record those I/O requests that are included in the snapshot, and (iii) updating of records (entries) in snapshot and volume tables of a cluster database (CDB).

    TECHNIQUE FOR RECOVERY OF TRAPPED STORAGE SPACE IN AN EXTENT STORE

    公开(公告)号:US20170192710A1

    公开(公告)日:2017-07-06

    申请号:US14988435

    申请日:2016-01-05

    Applicant: NetApp, Inc.

    CPC classification number: G06F3/0644 G06F3/0608 G06F3/067

    Abstract: A technique enables recovery of storage space trapped in an extent store due to overlapping write requests associated with metadata managed by a volume layer of a storage input/output stack executing on one or more nodes of a cluster. The metadata is organized as a multi-level dense tree metadata structure, wherein each level of the dense tree includes volume metadata entries for storing the metadata. When a level of the dense tree is full, the volume metadata entries of the level are merged with a next lower level of the dense tree in accordance with a dense tree merge operation. The technique may be invoked during the merge operation to process the volume metadata entries associated with the overlapping write requests at each level of the dense tree involved in the merge operation. Processing of the overlapping write requests during the merge operation may manifest as partial overwrites of one or more existing extents which, in turn, may result in logical storage space being trapped in the extent store. The technique may perform read-modify-write (RMW) operations on the partially overwritten extents to recapture that trapped space. The storage space trapped by the partially overwritten extents may be recovered by reading and re-writing one or more valid portions of each extent with storage space lockup through the use of “out-of-band”, i.e., independent of the merge, processing of the RMW operations.

    SNAPSHOT CREATION WORKFLOW
    14.
    发明申请
    SNAPSHOT CREATION WORKFLOW 有权
    SNAPSHOT创作工作流程

    公开(公告)号:US20170031769A1

    公开(公告)日:2017-02-02

    申请号:US14869340

    申请日:2015-09-29

    Applicant: NetApp, Inc.

    Abstract: A technique efficiently creates a snapshot for a logical unit (LUN) served by a storage input/output (I/O) stack executing on a node of a cluster that organizes data as extents referenced by keys. In addition, the technique efficiently creates one or more snapshots for a group of LUNs organized as a consistency group (CG) and served by storage I/O stacks executing on a plurality of nodes of the cluster. To that end, the technique involves a plurality of indivisible operations (i.e., transactions) of a snapshot creation workflow administered by a Storage Area Network (SAN) administration layer (SAL) of the storage I/O stack in response to a snapshot create request issued by a host. The SAL administers the snapshot creation workflow by initiating a set of transactions that includes, inter alia, (i) installation of barriers for LUNs (volumes) across all nodes in the cluster that participate in snapshot creation, (ii) creation of point-in-time (PIT) markers to record those I/O requests that are included in the snapshot, and (iii) updating of records (entries) in snapshot and volume tables of a cluster database (CDB).

    Abstract translation: 一种技术可以有效地创建一个逻辑单元(LUN)的快照,该逻辑单元(LUN)由集群的节点上执行的存储输入/输出(I / O)堆栈提供服务,该集群将数据组织为密钥引用的扩展区。 此外,该技术有效地为被组织为一致性组(CG)并由在集群的多个节点上执行的存储I / O堆栈服务的一组LUN有效地创建一个或多个快照。 为此,该技术涉及响应于快照创建请求,由存储I / O堆栈的存储区域网络(SAN)管理层(SAL)管理的快照创建工作流的多个不可分割的操作(即事务) 由主机发出。 SAL通过启动一组事务来管理快照创建工作流,其中包括(i)在参与快照创建的集群中的所有节点上安装LUN(卷)的障碍,(ii)创建点对点 -time(PIT)标记来记录快照中包含的那些I / O请求,以及(iii)更新群集数据库(CDB)的快照和卷表中的记录(条目)。

    Snapshot restore workflow
    15.
    发明授权

    公开(公告)号:US10394660B2

    公开(公告)日:2019-08-27

    申请号:US14815064

    申请日:2015-07-31

    Applicant: NetApp, Inc.

    Abstract: A snap restore technique efficiently restores snapshots of storage containers served by a storage input/output (I/O) stack executing on one or more nodes of a cluster. A Small Computer Systems Interface administration layer interacts with a volume layer of the storage I/O stack to manage and implement a snap restore procedure to restore one or more snapshots of a storage container. The storage container may be a logical unit (LUN) embodied as parent volume (active volume) and the snapshot may be represented as an independent volume embodied as read-only copy of the active volume. The snap restore procedure may be configured to allow restoration to a single snapshot of a LUN or restoration of a plurality of LUNs organized as a consistency group from a group of snapshots. Restoration of the LUN from a snapshot involves (i) creation of another independent volume embodied as a read-write copy (clone) of the snapshot, (ii) replacement of the (old) active volume with the clone, (iii) deletion of the old active volume, and (iv) mapping of the LUN to the clone (i.e., a new active volume).

    Transaction log layout for efficient reclamation and recovery

    公开(公告)号:US09952765B2

    公开(公告)日:2018-04-24

    申请号:US14876572

    申请日:2015-10-06

    Applicant: NetApp, Inc.

    Abstract: A layout of a transaction log enables efficient logging of metadata into entries of the log, as well as efficient reclamation and recovery of the log entries by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The transaction log is illustratively a two stage, append-only logging structure, wherein the first level is non-volatile random access memory (NVRAM) embodied as a NVlog and the second stage is disk, e.g., solid state drive (SSD). During crash recovery, the log entries are examined for consistency and scanned to identify those entries that have completed and those that are active, which require replay. The log entries are walked from oldest to newest (using sequence numbers) searching for the highest sequence number. Partially complete log entries (e.g., log entries in-progress when a crash occurs) may be discarded for failing a checksum (e.g., a CRC error). Old value/new value logs may be used to implement roll-forward or roll-back semantics to replay the log entries and fix any on-disk data structures, first from NVRAM and then from on-disk logs.

    SNAPSHOT RESTORE WORKFLOW
    17.
    发明申请
    SNAPSHOT RESTORE WORKFLOW 审中-公开
    SNAPSHOT恢复工作流程

    公开(公告)号:US20170031774A1

    公开(公告)日:2017-02-02

    申请号:US14815064

    申请日:2015-07-31

    Applicant: NetApp, Inc.

    Abstract: A snap restore technique efficiently restores snapshots of storage containers served by a storage input/output (I/O) stack executing on one or more nodes of a cluster. A Small Computer Systems Interface administration layer interacts with a volume layer of the storage I/O stack to manage and implement a snap restore procedure to restore one or more snapshots of a storage container. The storage container may be a logical unit (LUN) embodied as parent volume (active volume) and the snapshot may be represented as an independent volume embodied as read-only copy of the active volume. The snap restore procedure may be configured to allow restoration to a single snapshot of a LUN or restoration of a plurality of LUNs organized as a consistency group from a group of snapshots. Restoration of the LUN from a snapshot involves (i) creation of another independent volume embodied as a read-write copy (clone) of the snapshot, (ii) replacement of the (old) active volume with the clone, (iii) deletion of the old active volume, and (iv) mapping of the LUN to the clone (i.e., a new active volume).

    Abstract translation: 快速恢复技术有效地恢复由集群的一个或多个节点上执行的存储输入/输出(I / O)堆栈服务的存储容器的快照。 小型计算机系统接口管理层与存储I / O堆栈的卷层交互以管理和实现快照恢复过程以恢复存储容器的一个或多个快照。 存储容器可以是体现为父卷(活动卷)的逻辑单元(LUN),并且快照可以被表示为体现为活动卷的只读副本的独立卷。 快照恢复过程可以被配置为允许恢复LUN的单个快照或从一组快照组织为一致性组的多个LUN的恢复。 从快照恢复LUN涉及(i)创建体现为快照的读写副本(克隆)的另一独立卷,(ii)使用克隆替换(旧)活动卷,(iii)删除 旧的活动卷,以及(iv)将LUN映射到克隆(即,新的活动卷)。

    RECONSTRUCTION OF DENSE TREE VOLUME METADATA STATE ACROSS CRASH RECOVERY
    18.
    发明申请
    RECONSTRUCTION OF DENSE TREE VOLUME METADATA STATE ACROSS CRASH RECOVERY 审中-公开
    重建破碎树体积元数据

    公开(公告)号:US20170010939A1

    公开(公告)日:2017-01-12

    申请号:US15272971

    申请日:2016-09-22

    Applicant: NetApp, Inc.

    Abstract: Embodiments herein are directed to efficient crash recovery of persistent metadata managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. Volume metadata managed by the volume layer is organized as a multi-level dense tree, wherein each level of the dense tree includes volume metadata entries for storing the volume metadata. When a level of the dense tree is full, the volume metadata entries of the level are merged with the next lower level of the dense tree. During a merge operation, two sets of generation IDs may be used in accordance with a double buffer arrangement: a first generation ID for the append buffer that is full (i.e., a merge staging buffer) and a second, incremented generation ID for the append buffer that accepts new volume metadata entries. Upon completion of the merge operation, the lower level (e.g., level 1) to which the merge is directed is assigned the generation ID of the merge staging buffer.

    Abstract translation: 这里的实施例涉及由在集群的一个或多个节点上执行的存储输入/输出(I / O)栈的卷层管理的持久元数据的有效崩溃恢复。 由卷层管理的卷元数据组织为多级密集树,其中密集树的每个级别包括用于存储卷元数据的卷元数据条目。 当密集树的级别已满时,级别的卷元数据条目将与密集树的下一个较低级别合并。 在合并操作期间,可以根据双缓冲器布置来使用两组生成ID:用于追加缓冲器的第一代ID(即,合并暂存缓冲器)和用于附加的第二增量生成ID 接受新的卷元数据条目的缓冲区。 在合并操作完成后,向合并处理缓冲器的生成ID分配合并所指示的较低级别(例如,级别1)。

    Reconstruction of dense tree volume metadata state across crash recovery
    19.
    发明授权
    Reconstruction of dense tree volume metadata state across crash recovery 有权
    通过崩溃恢复重建密集树体元数据状态

    公开(公告)号:US09501359B2

    公开(公告)日:2016-11-22

    申请号:US14482618

    申请日:2014-09-10

    Applicant: NetApp, Inc.

    Abstract: Embodiments herein are directed to efficient crash recovery of persistent metadata managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. Volume metadata managed by the volume layer is organized as a multi-level dense tree, wherein each level of the dense tree includes volume metadata entries for storing the volume metadata. When a level of the dense tree is full, the volume metadata entries of the level are merged with the next lower level of the dense tree. During a merge operation, two sets of generation IDs may be used in accordance with a double buffer arrangement: a first generation ID for the append buffer that is full (i.e., a merge staging buffer) and a second, incremented generation ID for the append buffer that accepts new volume metadata entries. Upon completion of the merge operation, the lower level (e.g., level 1) to which the merge is directed is assigned the generation ID of the merge staging buffer.

    Abstract translation: 这里的实施例涉及由在集群的一个或多个节点上执行的存储输入/输出(I / O)栈的卷层管理的持久元数据的有效崩溃恢复。 由卷层管理的卷元数据组织为多级密集树,其中密集树的每个级别包括用于存储卷元数据的卷元数据条目。 当密集树的级别已满时,级别的卷元数据条目将与密集树的下一个较低级别合并。 在合并操作期间,可以根据双缓冲器布置来使用两组生成ID:用于追加缓冲器的第一代ID(即,合并暂存缓冲器)和用于附加的第二增量生成ID 接受新的卷元数据条目的缓冲区。 在合并操作完成后,向合并处理缓冲器的生成ID分配合并所指示的较低级别(例如,级别1)。

    OFFSET RANGE OPERATION STRIPING TO IMPROVE CONCURRENCY OF EXECUTION AND REDUCE CONTENTION AMONG RESOURCES
    20.
    发明申请
    OFFSET RANGE OPERATION STRIPING TO IMPROVE CONCURRENCY OF EXECUTION AND REDUCE CONTENTION AMONG RESOURCES 审中-公开
    偏离范围的操作条件提高了资源的执行和减少同步的和解

    公开(公告)号:US20160070644A1

    公开(公告)日:2016-03-10

    申请号:US14482957

    申请日:2014-09-10

    Applicant: NetApp, Inc.

    CPC classification number: G06F3/0688 G06F3/0611 G06F3/0644

    Abstract: An offset range striping technique increases concurrency of operation execution directed to metadata managed by a volume layer of a storage input/output (I/O) stack, while reducing contention among resources of one or more nodes of a cluster. A logical unit (LUN) may be apportioned into multiple volumes, each of which may be partitioned into multiple regions, wherein each region is represented by a dense tree. The technique increases concurrency of operation execution (e.g., modifications to the metadata at the offset ranges), while reducing contention among the resources (e.g., CPUs and NVLogs) by distributing the offset range operations among the regions and mapping the regions to services and NVLogs. Such increased concurrency and reduction of contention may be achieved by implementation of the technique to (i) apportion each region into disjoint chunks (i.e., stripes) of contiguous offset ranges; (ii) organize a plurality of regions into one or more zones and populate a first zone before allocating a second zone; and (iii) stagger the mapping of services to starting regions of the volumes.

    Abstract translation: 偏移范围条带化技术增加了针对由存储输入/输出(I / O)堆栈的卷层管理的元数据的操作执行的并发性,同时减少了集群的一个或多个节点的资源之间的争用。 逻辑单元(LUN)可以被分配成多个卷,每个卷可被划分成多个区域,其中每个区域由密集的树表示。 该技术增加了操作执行的并发性(例如,在偏移范围内对元数据的修改),同时通过在区域之间分配偏移范围操作来减少资源(例如,CPU和NVLogs)之间的争用,并将该区域映射到服务和NVLogs 。 这种增加的并发性和降低竞争力可以通过实现该技术来实现,以(i)将每个区域分配成相邻偏移范围的不相交的块(即条带); (ii)在分配第二区域之前将多个区域组织成一个或多个区域并填充第一区域; 和(iii)将服务的映射错开到卷的起始区域。

Patent Agency Ranking