Abstract:
One or more techniques and/or systems are provided for interconnect failover between a primary storage controller and a secondary storage controller. The secondary storage controller may be configured as a backup or failover storage controller for the primary storage controller in the event the primary storage controller fails. Data and/or metadata describing the data (e.g., data and/or metadata stored within a write cache) may be mirrored from the primary storage controller to the secondary storage controller over one or more interconnect paths. Responsive to identifying a failover trigger for a failed interconnect path, the secondary storage controller is instructed to fence (e.g., block) I/O operations from the failed interconnect path. Streams of data and/or metadata that were affected by the failure may be instructed to transmit such data and/or metadata over one or more non-failed interconnect paths to the secondary storage controller during failover of the failed interconnect path.
Abstract:
During a storage redundancy giveback from a first node to a second node following a storage redundancy takeover from the second node by the first node, the second node is initialized in part by receiving a node identification indicator from the second node. The node identification indicator is included in a node advertisement message sent by the second node during a giveback wait phase of the storage redundancy giveback. The node identification indicator includes an intra-cluster node connectivity identifier that is used by the first node to determine whether the second node is an intra-cluster takeover partner. In response to determining that the second node is an intra-cluster takeover partner, the first node completes the giveback of storage resources to the second node.
Abstract:
One or more techniques and/or computing devices are provided for automatic switchover implementation. For example, a first storage controller, of a first storage cluster, may have a disaster recovery relationship with a second storage controller of a second storage cluster. In the event the first storage controller fails, the second storage controller may automatically switchover operation from the first storage controller to the second storage controller for providing clients with failover access to data previously accessible to the clients through the first storage controller. The second storage controller may detect, cross-cluster, a failure of the first storage controller utilizing remote direct memory access (RDMA) read operations to access heartbeat information, heartbeat information stored within a disk mailbox, and/or service processor traps. In this way, the second storage controller may efficiently detect failure of the first storage controller to trigger automatic switchover for non-disruptive client access to data.
Abstract:
One or more techniques and/or computing devices are provided for communicating storage controller failures utilizing service processor traps. A first storage controller, of a first storage cluster, has a disaster recovery relationship with a second storage controller of a second storage cluster. The first storage controller comprise a first service processor configured to monitor health of the first storage controller. Responsive to identifying a failure of the first storage controller, the first service processor uses stored communication configuration of a second service processor of the second storage controller to send a service processor trap to the second service processor. In this way, the second service processor initiates a switchover operation by the second storage controller to provide clients with failover access to data previously available through the first storage controller before the failure. Proactive notification of storage controller failures utilizing service processor traps reduces client data access disruptions.
Abstract:
One or more techniques and/or computing devices are provided for communicating storage controller failures utilizing service processor traps. A first storage controller, of a first storage cluster, has a disaster recovery relationship with a second storage controller of a second storage cluster. The first storage controller comprise a first service processor configured to monitor health of the first storage controller. Responsive to identifying a failure of the first storage controller, the first service processor uses stored communication configuration of a second service processor of the second storage controller to send a service processor trap to the second service processor. In this way, the second service processor initiates a switchover operation by the second storage controller to provide clients with failover access to data previously available through the first storage controller before the failure. Proactive notification of storage controller failures utilizing service processor traps reduces client data access disruptions.
Abstract:
One or more techniques and/or systems are provided for dynamic mirroring. A first storage node and the second storage node within a first storage cluster may locally mirror data between one another based upon a local failover partnership. The first storage node and a third storage node within a second storage cluster may remotely mirror data between one another based upon a primary disaster recovery partnership. If the third storage node fails, then the first storage node may remotely mirror data to a fourth storage node within the second storage cluster based upon an auxiliary disaster recovery partnership. In this way, data loss protection for the first storage node may be improved, such that the fourth storage node provide clients with access to mirrored data from the first storage node in the event the second storage node and/or the third storage node are unavailable when the first storage node fails.
Abstract:
Systems and methods which provide for managing multiple mirror resources in a storage distribution network are provided. In some embodiments, a system provides for both high availability and disaster recovery functionality at different mirroring locations. Other embodiments may provide for multiple high availability and/or multiple disaster recovery mirror resources. These mirror resources are operated in a heterogeneous manner in the sense that each have its own transport, protocol, and the like, but are configured function cooperatively or as a single mirror with respect to mirroring a primary node. Embodiments may provide for the mirroring and resynchronization of mirrored resources in the event of a communication loss with a particular resource without ceasing the mirroring operations to other resources.
Abstract:
A system and method for handling multi-node failures in a disaster recovery cluster is provided. In the event of an error condition, a switchover operation occurs from the failed nodes to one or more surviving nodes. Data stored in non-volatile random access memory is recovered by the surviving nodes to bring storage objects, e.g., disks, aggregates and/or volumes into a consistent state.
Abstract:
During a storage redundancy giveback from a first node to a second node following a storage redundancy takeover from the second node by the first node, the second node is initialized in part by receiving a node identification indicator from the second node. The node identification indicator is included in a node advertisement message sent by the second node during a giveback wait phase of the storage redundancy giveback. The node identification indicator includes an intra-cluster node connectivity identifier that is used by the first node to determine whether the second node is an intra-cluster takeover partner. In response to determining that the second node is an intra-cluster takeover partner, the first node completes the giveback of storage resources to the second node.
Abstract:
Systems and methods herein are operable to simultaneously mirror data to a plurality of mirror partner nodes. In embodiments, a mirror client may be unaware of the number of mirror partner nodes and/or the location of the plurality of mirror partner nodes, and issue a single mirror command requesting initiation of a mirror operation. An interconnect layer may receive the single mirror command and split the mirror command into a plurality of mirror instances, one for each mirror node partner, wherein the mirror instances may be simultaneously launched. After the plurality of mirror operations has begun, the interconnect layer may manage completion reports indicating the completion status of respective mirror operations, and send a single return to the mirror client indicating whether the mirror command succeeded.