Standby copies withstand cascading fails

    公开(公告)号:US11194501B2

    公开(公告)日:2021-12-07

    申请号:US16752001

    申请日:2020-01-24

    申请人: NetApp, Inc.

    摘要: A technique is configured to maintain multiple copies of data served by storage nodes of a cluster during upgrade of a storage node to ensure continuous protection of the data served by the nodes. The data is logically organized as one or more volumes on storage devices of the cluster and includes metadata that describe the data of each volume. A data protection system may be configured to maintain at least two copies of the data in the cluster during upgrade to a storage node that is assigned to host one of the copies of the data but that is taken offline during the upgrade. As a result, an original slice service of the node may be rendered unavailable during the upgrade. In response, the technique redirects replicated data targeted to the original slice service to a standby pool of slice services in accordance with a degraded redundant metadata service of the cluster. In the event the standby slice service itself subsequently becomes unavailable, another standby slice service from the standby pool is activated to receive the subsequent data. In this manner, cascading failure of secondary slice slices is handled.

    Pooling blocks for erasure coding write groups

    公开(公告)号:US11175989B1

    公开(公告)日:2021-11-16

    申请号:US16858376

    申请日:2020-04-24

    申请人: NetApp, Inc.

    IPC分类号: G06F11/10 H03M13/15 H03M7/30

    摘要: A technique provides efficient data protection, such as erasure coding, for data blocks of volumes served by storage nodes of a cluster. Data blocks associated with write requests of unpredictable client workload patterns may be compressed. A set of the compressed data blocks may be selected to form a write group and an erasure code may be applied to the group to algorithmically generate one or more encoded blocks in addition to the data blocks. Due to the unpredictability of the data workload patterns, the compressed data blocks may have varying sizes. A pool of the various-sized compressed data blocks may be established and maintained from which the data blocks of the write group are selected. Establishment and maintenance of the pool enables selection of compressed data blocks that are substantially close to the same size and, thus, that require minimal padding.

    BIN SYNCING TECHNIQUE FOR MULTIPLE DATA PROTECTION SCHEMES

    公开(公告)号:US20210248254A1

    公开(公告)日:2021-08-12

    申请号:US16788979

    申请日:2020-02-12

    申请人: NetApp, Inc.

    IPC分类号: G06F21/62 G06F16/27 H04L9/32

    摘要: A bin syncing technique ensures continuous data protection, such as replication and erasure coding, for content driven distribution of data served by storage nodes of a cluster in the event of failure to one or more block services configured to process the data. The cluster maintains information about the block services assigned to host a bin with a copy of the data in a bin assignment table associated with a state. The copies of the data are named, e.g., replica 0 (R0), replica 1 (R1) or replica 2 (R2). In response to failure of one or more block services assigned to host a bin with a replica of the data, an alternate or replacement block service may access the assignments maintained in the bin assignment table, which specify names of the replicas associated with the state.

    WRITE TYPE BASED CREDITING FOR BLOCK LEVEL WRITE THROTTLING TO CONTROL IMPACT TO READ INPUT/OUTPUT OPERATIONS

    公开(公告)号:US20220091739A1

    公开(公告)日:2022-03-24

    申请号:US17031461

    申请日:2020-09-24

    申请人: NetApp, Inc.

    IPC分类号: G06F3/06

    摘要: A technique manages bandwidth allocated among input/output operations, such as reads and writes, to storage devices coupled to storage nodes of a cluster. The technique balances the writes in a manner that reduces latency of reads, while allowing the writes to complete in a desired amount of time. The writes include write types, such as client writes, data migration writes, block transfer writes, and recycling writes, which are defined by differing characteristics and relative priorities. To ensure timely completion of the write types, the technique provides periodic time intervals over which the writes may be balanced and allocated sufficient bandwidth to access the storage devices. The time intervals may include shuffle intervals within a larger distribution interval. In addition, the technique throttles certain write types at the storage device level to maintain consistent read performance. Throttling is based on a credit system that allocates bandwidth as “credits” based on write type.

    Updating no sync technique for ensuring continuous storage service in event of degraded cluster state

    公开(公告)号:US11223681B2

    公开(公告)日:2022-01-11

    申请号:US16845786

    申请日:2020-04-10

    申请人: NetApp, Inc.

    摘要: An Updating No Sync (UNS) technique ensures continuous data protection for content driven distribution of data served by storage nodes of a fault tolerant cluster in the event of a degraded cluster state. A storage service implemented in each node includes one or more slice services (SSs) configured to process and store metadata describing the data served by the storage nodes and one or more block services (BSs) configured to process and store the data on storage devices of the node. A bin assignment service may coopt one or more healthy BSs to temporarily store updates of data and metadata received at the SS while the cluster is in the degraded state as an overflow data path (hence the term “Updating No Sync,” which denotes updating without synchronizing, i.e., not distributing the data within the cluster, but only accumulating an overflow of SS information). Once the cluster is no longer degraded, the accumulated overflow SS information at the BSs may be synchronized back to restored BSs, i.e., according to write path determination in the absence of node failure/unavailability.

    Bin syncing technique for multiple data protection schemes

    公开(公告)号:US11514181B2

    公开(公告)日:2022-11-29

    申请号:US16788979

    申请日:2020-02-12

    申请人: NetApp, Inc.

    摘要: A bin syncing technique ensures continuous data protection, such as replication and erasure coding, for content driven distribution of data served by storage nodes of a cluster in the event of failure to one or more block services configured to process the data. The cluster maintains information about the block services assigned to host a bin with a copy of the data in a bin assignment table associated with a state. The copies of the data are named, e.g., replica 0 (R0), replica 1 (R1) or replica 2 (R2). In response to failure of one or more block services assigned to host a bin with a replica of the data, an alternate or replacement block service may access the assignments maintained in the bin assignment table, which specify names of the replicas associated with the state.

    Write type based crediting for block level write throttling to control impact to read input/output operations

    公开(公告)号:US11372544B2

    公开(公告)日:2022-06-28

    申请号:US17031461

    申请日:2020-09-24

    申请人: NetApp, Inc.

    IPC分类号: G06F3/06

    摘要: A technique manages bandwidth allocated among input/output operations, such as reads and writes, to storage devices coupled to storage nodes of a cluster. The technique balances the writes in a manner that reduces latency of reads, while allowing the writes to complete in a desired amount of time. The writes include write types, such as client writes, data migration writes, block transfer writes, and recycling writes, which are defined by differing characteristics and relative priorities. To ensure timely completion of the write types, the technique provides periodic time intervals over which the writes may be balanced and allocated sufficient bandwidth to access the storage devices. The time intervals may include shuffle intervals within a larger distribution interval. In addition, the technique throttles certain write types at the storage device level to maintain consistent read performance. Throttling is based on a credit system that allocates bandwidth as “credits” based on write type.

    STANDBY COPIES WITHSTAND CASCADING FAILS

    公开(公告)号:US20210232314A1

    公开(公告)日:2021-07-29

    申请号:US16752001

    申请日:2020-01-24

    申请人: NetApp, Inc.

    IPC分类号: G06F3/06

    摘要: A technique is configured to maintain multiple copies of data served by storage nodes of a cluster during upgrade of a storage node to ensure continuous protection of the data served by the nodes. The data is logically organized as one or more volumes on storage devices of the cluster and includes metadata that describe the data of each volume. A data protection system may be configured to maintain at least two copies of the data in the cluster during upgrade to a storage node that is assigned to host one of the copies of the data but that is taken offline during the upgrade. As a result, an original slice service of the node may be rendered unavailable during the upgrade. In response, the technique redirects replicated data targeted to the original slice service to a standby pool of slice services in accordance with a degraded redundant metadata service of the cluster. In the event the standby slice service itself subsequently becomes unavailable, another standby slice service from the standby pool is activated to receive the subsequent data. In this manner, cascading failure of secondary slice slices is handled.