Abstract:
A system and method for handling multi-node failures in a disaster recovery cluster is provided. In the event of an error condition, a switchover operation occurs from the failed nodes to one or more surviving nodes. Data stored in non-volatile random access memory is recovered by the surviving nodes to bring storage objects, e.g., disks, aggregates and/or volumes into a consistent state.
Abstract:
A technique maintains consistent throughput of processing of input/output (I/O) requests by a storage system when changing configuration of one or more Redundant Array of Independent Disks (RAID) groups of storage devices, such as disks, within the storage system. The configuration of a RAID group (i.e., RAID configuration) may be represented by RAID objects (e.g., reference-counted data structures) stored in a memory of the storage system. Illustratively, the RAID objects may be organized as a RAID configuration hierarchy including a top-level RAID object (e.g., RAID group data structure) that is linked (e.g., via one or more pointers) to one or more intermediate-level RAID objects (e.g., disk and segment data structures) which, in turn, are linked to one or more low-level RAID objects (e.g., chunk data structures). According to the technique, a snapshot of a current RAID configuration (i.e., current configuration snapshot) may be created by incrementing a reference count of the current top-level object of the hierarchy and attaching (e.g., via a pointer) the current configuration snapshot to a current I/O request processed by the storage system.
Abstract:
A system and method for avoiding object identifier collisions in a cluster environment is provided. Upon creation of the cluster, volume location databases negotiate ranges for data set identifiers (DSIDs) between a first site and a second site of the cluster. Any pre-existing objects are remapped into an object identifier range associated with the particular site hosting the object.
Abstract:
A technique efficiently configures a peered cluster storage environment. The configuration technique illustratively includes three phases: a discovery phase, a node setup phase and a cluster setup phase. The discovery phase may be employed to initiate discovery of nodes of a disaster recovery (DR) group through transmission of multicast advertisement packets by the nodes over interconnects, including a Fiber Channel (FC) fabric, to each other node of the group. In the node setup phase, each node of a cluster assigns its relationships to the nodes discovered and present in the FC fabric; illustratively, the assigned relationships include high availability (HA) partner, DR primary partner and DR auxiliary partner. In the cluster setup phase, the discovered nodes of the FC fabric are organized as the peered cluster storage environment (DR group) configured to service data in a highly reliable and available manner.
Abstract:
A system and method for handling multi-node failures in a disaster recovery cluster is provided. In the event of an error condition, a switchover operation occurs from the failed nodes to one or more surviving nodes. Data stored in non-volatile random access memory is recovered by the surviving nodes to bring storage objects, e.g., disks, aggregates and/or volumes into a consistent state.
Abstract:
One or more techniques and/or systems are provided for load balancing between storage controllers. For example, a first storage controller and a second storage controller may be configured at a first storage site according to a high availability configuration, and may be configured as disaster recovery partners for a third storage controller and a fourth storage controller at a second storage site. If the first storage controller fails, the second storage controller provides failover operation for a first storage device. If a disaster occurs at the second storage site, the second storage controller provides switchover operation for a third storage device and a fourth storage device. Responsive to the first storage controller being restored, the third storage device may be reassigned from the second storage controller to the first storage controller for load balancing at the first storage site during disaster recovery of the second storage site.