Storage cluster failure detection

    公开(公告)号:US11163653B2

    公开(公告)日:2021-11-02

    申请号:US16679823

    申请日:2019-11-11

    Applicant: NetApp Inc.

    Abstract: Direct monitoring of a plurality of storage nodes in a primary cluster is performed based on connectivity with the storage nodes. Indirect monitoring of a first storage node is performed, in response to direct monitoring of the first storage node indicating failure of the connectivity with the first storage node, wherein a second storage node of the plurality of nodes is a backup node for the first storage node. The indirect monitor of the first storage node indicates failure of the first storage node in response to performance of storage access operations by the second storage node that were previously performed by the first storage node. A cluster-switch operation is initiated to switch to from the primary cluster to a backup cluster based on an occurrence of at least one cluster-failure condition that comprises the indirect monitor of the first storage node indicating failure of the first storage node.

    Monitoring storage cluster elements

    公开(公告)号:US10437510B2

    公开(公告)日:2019-10-08

    申请号:US14613085

    申请日:2015-02-03

    Applicant: NetApp, Inc.

    Abstract: Monitoring health of associated, but separated storage clusters can be done at both a node scope and a cluster scope. Monitoring the storage clusters at the cluster scope includes monitoring the network elements that support the storage clusters and connect the storage clusters. Initially, a fabric monitor in each cluster discovers cluster topology. This cluster topology is communicated and maintained throughout the managing storage elements of the storage clusters. After the storage cluster topologies have been discovered, the fabric monitors of each cluster can periodically determine status of network elements of the storage clusters. This allows the storage clusters to maintain awareness of interconnect status, and react to changes in status. In addition, each managing storage element monitors its own health. This information is aggregated to determine when to trigger corrective actions, alerts, and/or storage features in accordance with rules defined at the managing storage elements.

    INTERCONNECT PATH FAILOVER
    3.
    发明申请

    公开(公告)号:US20150309892A1

    公开(公告)日:2015-10-29

    申请号:US14261556

    申请日:2014-04-25

    Applicant: NetApp Inc.

    Abstract: One or more techniques and/or systems are provided for interconnect failover between a primary storage controller and a secondary storage controller. The secondary storage controller may be configured as a backup or failover storage controller for the primary storage controller in the event the primary storage controller fails. Data and/or metadata describing the data (e.g., data and/or metadata stored within a write cache) may be mirrored from the primary storage controller to the secondary storage controller over one or more interconnect paths. Responsive to identifying a failover trigger for a failed interconnect path, the secondary storage controller is instructed to fence (e.g., block) I/O operations from the failed interconnect path. Streams of data and/or metadata that were affected by the failure may be instructed to transmit such data and/or metadata over one or more non-failed interconnect paths to the secondary storage controller during failover of the failed interconnect path.

    Abstract translation: 提供一个或多个技术和/或系统用于主存储控制器和辅助存储控制器之间的互连故障切换。 在主存储控制器发生故障的情况下,辅助存储控制器可以被配置为主存储控制器的备份或故障转移存储控制器。 可以通过一个或多个互连路径将描述数据(例如,存储在写高速缓存内的数据和/或元数据)的数据和/或元数据从主存储控制器镜像到辅存储控制器。 响应于识别故障互连路径的故障转移触发器,指示辅助存储控制器围绕(例如,阻止)来自故障互连路径的I / O操作。 可以指示受故障影响的数据流和/或元数据流在故障互连路径的故障转移期间通过一个或多个非故障互连路径将此类数据和/或元数据传送到辅助存储控制器。

    STORAGE CLUSTER FAILURE DETECTION
    4.
    发明申请

    公开(公告)号:US20180095849A1

    公开(公告)日:2018-04-05

    申请号:US15820784

    申请日:2017-11-22

    Applicant: NetApp Inc.

    Abstract: Direct monitoring of a plurality of storage nodes in a primary cluster is performed based on connectivity with the storage nodes. Indirect monitoring of a first storage node is performed, in response to direct monitoring of the first storage node indicating failure of the connectivity with the first storage node, wherein a second storage node of the plurality of nodes is a backup node for the first storage node. The indirect monitor of the first storage node indicates failure of the first storage node in response to performance of storage access operations by the second storage node that were previously performed by the first storage node. A cluster-switch operation is initiated to switch to from the primary cluster to a backup cluster based on an occurrence of at least one cluster-failure condition that comprises the indirect monitor of the first storage node indicating failure of the first storage node.

    Interconnect path failover
    6.
    发明授权
    Interconnect path failover 有权
    互连路径故障切换

    公开(公告)号:US09354992B2

    公开(公告)日:2016-05-31

    申请号:US14261556

    申请日:2014-04-25

    Applicant: NetApp Inc.

    Abstract: One or more techniques and/or systems are provided for interconnect failover between a primary storage controller and a secondary storage controller. The secondary storage controller may be configured as a backup or failover storage controller for the primary storage controller in the event the primary storage controller fails. Data and/or metadata describing the data (e.g., data and/or metadata stored within a write cache) may be mirrored from the primary storage controller to the secondary storage controller over one or more interconnect paths. Responsive to identifying a failover trigger for a failed interconnect path, the secondary storage controller is instructed to fence (e.g., block) I/O operations from the failed interconnect path. Streams of data and/or metadata that were affected by the failure may be instructed to transmit such data and/or metadata over one or more non-failed interconnect paths to the secondary storage controller during failover of the failed interconnect path.

    Abstract translation: 提供一个或多个技术和/或系统用于主存储控制器和辅助存储控制器之间的互连故障切换。 在主存储控制器发生故障的情况下,辅助存储控制器可以被配置为主存储控制器的备份或故障转移存储控制器。 可以通过一个或多个互连路径将描述数据(例如,存储在写高速缓存内的数据和/或元数据)的数据和/或元数据从主存储控制器镜像到辅存储控制器。 响应于识别故障互连路径的故障转移触发器,指示辅助存储控制器围绕(例如,阻止)来自故障互连路径的I / O操作。 可以指示受故障影响的数据流和/或元数据流在故障互连路径的故障转移期间通过一个或多个非故障互连路径将此类数据和/或元数据传送到辅助存储控制器。

    STORAGE CLUSTER FAILURE DETECTION
    7.
    发明申请
    STORAGE CLUSTER FAILURE DETECTION 有权
    存储群故障检测

    公开(公告)号:US20160132411A1

    公开(公告)日:2016-05-12

    申请号:US14718346

    申请日:2015-05-21

    Applicant: NetApp, Inc.

    Abstract: Direct monitoring of a plurality of storage nodes in a primary cluster is performed based on connectivity with the storage nodes. Indirect monitoring of a first storage node is performed, in response to direct monitoring of the first storage node indicating failure of the connectivity with the first storage node, wherein a second storage node of the plurality of nodes is a backup node for the first storage node. The indirect monitor of the first storage node indicates failure of the first storage node in response to performance of storage access operations by the second storage node that were previously performed by the first storage node. A cluster-switch operation is initiated to switch to from the primary cluster to a backup cluster based on an occurrence of at least one cluster-failure condition that comprises the indirect monitor of the first storage node indicating failure of the first storage node.

    Abstract translation: 基于与存储节点的连接性来执行主集群中的多个存储节点的直接监视。 响应于对第一存储节点的直接监视指示与第一存储节点的连接失败,执行第一存储节点的间接监视,其中多个节点中的第二存储节点是第一存储节点的备份节点 。 响应于先前由第一存储节点执行的第二存储节点的存储访问操作的性能,第一存储节点的间接监视器指示第一存储节点的故障。 基于至少一个集群故障状况的发生,启动集群交换操作以从主集群切换到备份集群,所述集群故障条件包括指示第一存储节点的故障的第一存储节点的间接监视。

    Monitoring storage cluster elements

    公开(公告)号:US11106388B2

    公开(公告)日:2021-08-31

    申请号:US16591714

    申请日:2019-10-03

    Applicant: NetApp inc.

    Abstract: Monitoring health of associated, but separated storage clusters can be done at both a node scope and a cluster scope. Monitoring the storage clusters at the cluster scope includes monitoring the network elements that support the storage clusters and connect the storage clusters. Initially, a fabric monitor in each cluster discovers cluster topology. This cluster topology is communicated and maintained throughout the managing storage elements of the storage clusters. After the storage cluster topologies have been discovered, the fabric monitors of each cluster can periodically determine status of network elements of the storage clusters. This allows the storage clusters to maintain awareness of interconnect status, and react to changes in status. In addition, each managing storage element monitors its own health. This information is aggregated to determine when to trigger corrective actions, alerts, and/or storage features in accordance with rules defined at the managing storage elements.

    STORAGE CLUSTER FAILURE DETECTION
    9.
    发明申请

    公开(公告)号:US20200073768A1

    公开(公告)日:2020-03-05

    申请号:US16679823

    申请日:2019-11-11

    Applicant: NetApp Inc.

    Abstract: Direct monitoring of a plurality of storage nodes in a primary cluster is performed based on connectivity with the storage nodes. Indirect monitoring of a first storage node is performed, in response to direct monitoring of the first storage node indicating failure of the connectivity with the first storage node, wherein a second storage node of the plurality of nodes is a backup node for the first storage node. The indirect monitor of the first storage node indicates failure of the first storage node in response to performance of storage access operations by the second storage node that were previously performed by the first storage node. A cluster-switch operation is initiated to switch to from the primary cluster to a backup cluster based on an occurrence of at least one cluster-failure condition that comprises the indirect monitor of the first storage node indicating failure of the first storage node.

    MONITORING STORAGE CLUSTER ELEMENTS
    10.
    发明申请

    公开(公告)号:US20200034069A1

    公开(公告)日:2020-01-30

    申请号:US16591714

    申请日:2019-10-03

    Applicant: NetApp Inc.

    Abstract: Monitoring health of associated, but separated storage clusters can be done at both a node scope and a cluster scope. Monitoring the storage clusters at the cluster scope includes monitoring the network elements that support the storage clusters and connect the storage clusters. Initially, a fabric monitor in each cluster discovers cluster topology. This cluster topology is communicated and maintained throughout the managing storage elements of the storage clusters. After the storage cluster topologies have been discovered, the fabric monitors of each cluster can periodically determine status of network elements of the storage clusters. This allows the storage clusters to maintain awareness of interconnect status, and react to changes in status. In addition, each managing storage element monitors its own health. This information is aggregated to determine when to trigger corrective actions, alerts, and/or storage features in accordance with rules defined at the managing storage elements.

Patent Agency Ranking