Abstract:
During a storage redundancy giveback from a first node to a second node following a storage redundancy takeover from the second node by the first node, the second node is initialized in part by receiving a node identification indicator from the second node. The node identification indicator is included in a node advertisement message sent by the second node during a giveback wait phase of the storage redundancy giveback. The node identification indicator includes an intra-cluster node connectivity identifier that is used by the first node to determine whether the second node is an intra-cluster takeover partner. In response to determining that the second node is an intra-cluster takeover partner, the first node completes the giveback of storage resources to the second node.
Abstract:
Systems and methods which provide for managing multiple mirror resources in a storage distribution network are provided. In some embodiments, a system provides for both high availability and disaster recovery functionality at different mirroring locations. Other embodiments may provide for multiple high availability and/or multiple disaster recovery mirror resources. These mirror resources are operated in a heterogeneous manner in the sense that each have its own transport, protocol, and the like, but are configured function cooperatively or as a single mirror with respect to mirroring a primary node. Embodiments may provide for the mirroring and resynchronization of mirrored resources in the event of a communication loss with a particular resource without ceasing the mirroring operations to other resources.
Abstract:
One or more techniques and/or systems are provided for interconnect failover between a primary storage controller and a secondary storage controller. The secondary storage controller may be configured as a backup or failover storage controller for the primary storage controller in the event the primary storage controller fails. Data and/or metadata describing the data (e.g., data and/or metadata stored within a write cache) may be mirrored from the primary storage controller to the secondary storage controller over one or more interconnect paths. Responsive to identifying a failover trigger for a failed interconnect path, the secondary storage controller is instructed to fence (e.g., block) I/O operations from the failed interconnect path. Streams of data and/or metadata that were affected by the failure may be instructed to transmit such data and/or metadata over one or more non-failed interconnect paths to the secondary storage controller during failover of the failed interconnect path.
Abstract:
A technique efficiently configures a peered cluster storage environment. The configuration technique illustratively includes three phases: a discovery phase, a node setup phase and a cluster setup phase. The discovery phase may be employed to initiate discovery of nodes of a disaster recovery (DR) group through transmission of multicast advertisement packets by the nodes over interconnects, including a Fibre Channel (FC) fabric, to each other node of the group. In the node setup phase, each node of a cluster assigns its relationships to the nodes discovered and present in the FC fabric; illustratively, the assigned relationships include high availability (HA) partner, DR primary partner and DR auxiliary partner. In the cluster setup phase, the discovered nodes of the FC fabric are organized as the peered cluster storage environment (DR group) configured to service data in a highly reliable and available manner.
Abstract:
One or more techniques and/or computing devices are provided for communicating storage controller failures utilizing service processor traps. A first storage controller, of a first storage cluster, has a disaster recovery relationship with a second storage controller of a second storage cluster. The first storage controller comprise a first service processor configured to monitor health of the first storage controller. Responsive to identifying a failure of the first storage controller, the first service processor uses stored communication configuration of a second service processor of the second storage controller to send a service processor trap to the second service processor. In this way, the second service processor initiates a switchover operation by the second storage controller to provide clients with failover access to data previously available through the first storage controller before the failure. Proactive notification of storage controller failures utilizing service processor traps reduces client data access disruptions.
Abstract:
During a storage redundancy giveback from a first node to a second node following a storage redundancy takeover from the second node by the first node, the second node is initialized in part by receiving a node identification indicator from the second node. The node identification indicator is included in a node advertisement message sent by the second node during a giveback wait phase of the storage redundancy giveback. The node identification indicator includes an intra-cluster node connectivity identifier that is used by the first node to determine whether the second node is an intra-cluster takeover partner. In response to determining that the second node is an intra-cluster takeover partner, the first node completes the giveback of storage resources to the second node.
Abstract:
A system and method for handling multi-node failures in a disaster recovery cluster is provided. In the event of an error condition, a switchover operation occurs from the failed nodes to one or more surviving nodes. Data stored in non-volatile random access memory is recovered by the surviving nodes to bring storage objects, e.g., disks, aggregates and/or volumes into a consistent state.
Abstract:
One or more techniques and/or systems are provided for interconnect failover between a primary storage controller and a secondary storage controller. The secondary storage controller may be configured as a backup or failover storage controller for the primary storage controller in the event the primary storage controller fails. Data and/or metadata describing the data (e.g., data and/or metadata stored within a write cache) may be mirrored from the primary storage controller to the secondary storage controller over one or more interconnect paths. Responsive to identifying a failover trigger for a failed interconnect path, the secondary storage controller is instructed to fence (e.g., block) I/O operations from the failed interconnect path. Streams of data and/or metadata that were affected by the failure may be instructed to transmit such data and/or metadata over one or more non-failed interconnect paths to the secondary storage controller during failover of the failed interconnect path.
Abstract:
One or more techniques and/or computing devices are provided for communicating storage controller failures utilizing service processor traps. A first storage controller, of a first storage cluster, has a disaster recovery relationship with a second storage controller of a second storage cluster. The first storage controller comprise a first service processor configured to monitor health of the first storage controller. Responsive to identifying a failure of the first storage controller, the first service processor uses stored communication configuration of a second service processor of the second storage controller to send a service processor trap to the second service processor. In this way, the second service processor initiates a switchover operation by the second storage controller to provide clients with failover access to data previously available through the first storage controller before the failure. Proactive notification of storage controller failures utilizing service processor traps reduces client data access disruptions.
Abstract:
One or more techniques and/or computing devices are provided for preserving coredump data. A first storage controller, of a first storage cluster, may have a disaster recovery relationship with a second storage controller of a second storage cluster. When the first storage controller fails, the first storage controller performs a coredump process to dump memory contents of the first storage controller into a storage device. During implementation of the coredump process, the first storage controller stores a storage device identifier of the storage device into a disk mailbox. Upon detecting the failure, the second storage controller reads the storage device identifier from the disk mailbox. The second storage controller performs a switchover operation to change ownership of storage devices, but excluding the storage device used by the coredump process, from the first storage controller to the second storage controller for providing clients with failover access to the storage devices.