摘要:
A two-phase process FlashCopy operation is provided that can be used to aid in the formation of consistency groups across multiple storage control units. In the first phase, preparations to create a new consistency group are made “revertible” by write-inhibiting the source volumes through “Establish-FlashCopy-revertible” commands. If the preparation of any volume within the consistency group fails, a “Withdraw-FlashCopy-revert” command may be executed, thereby causing a retention of the prior FlashCopy point-in-time copy. In the second phase, executed if all preparations are successful, a “Withdraw-FlashCopy-commit” command may be executed to remove all write-inhibit indicators, complete the creation of the new FlashCopy point-in-time copy and secure the new consistency group. Write requests to the FlashCopy source volumes may then be received and processed without risking corruption of the new consistency group on the Flashcopy target volumes.
摘要:
An apparatus, system, and method are disclosed for performing an incremental resynchronization between two unrelated volumes when a third volume fails. The apparatus, system, and method include initiating registration of changed tracks; keeping track of bytes in flight activities between a local volume and an intermediate volume; recording the changed tracks in bitmaps at the local volume; stopping the recording of the changed tracks; and starting a resynchronization process by sending the changed tracks to a recovery volume.
摘要:
An apparatus, system, and method are disclosed for performing an incremental resynchronization between two unrelated volumes when a third volume fails. The apparatus, system, and method include initiating registration of changed tracks; keeping track of bytes in flight activities between a local volume and an intermediate volume; recording the changed tracks in bitmaps at the local volume; stopping the recording of the changed tracks; and starting a resynchronization process by sending the changed tracks to a recovery volume.
摘要:
A method of recovery from a data storage system failure in a data storage system having a host computer writing data to a first storage unit with a first storage controller synchronously mirroring the data to a second storage unit, and with a second storage controller asynchronously mirroring the data to a third storage unit. Upon detection of an error or failure associated with the first storage unit, the synchronous data mirroring relationship between the first storage unit and the second storage unit is terminated and the host is directed to write data updates directly to the second storage unit. Upon correction of the failure associated the asynchronous mirroring of data updates from the second storage unit to the third storage unit is suspended and synchronous mirroring of the data updates in a reverse direction, from the second storage unit to the first storage unit, is commenced.
摘要:
A method of recovery from a data storage system failure in a data storage system having a host computer writing data to a first storage unit with a first storage controller synchronously mirroring the data to a second storage unit, and with a second storage controller asynchronously mirroring the data to a third storage unit. The method begins with the detection of a failure associated with the first storage unit. Upon detection of the error or failure associated with the first storage unit, the synchronous data mirroring relationship between the first storage unit and the second storage unit is terminated and the host is directed to write data updates directly to the second storage unit. Upon correction of the failure associated with the first storage unit, the asynchronous mirroring of data updates from the second storage unit to the third storage unit is suspended and synchronous mirroring of the data updates in a reverse direction, from the second storage unit to the first storage unit, is commenced. When a full duplex state is reached between the first storage unit and the second storage unit, the synchronous PPRC relationship with the first storage volume mirroring data to the second storage volume may be reestablished and host I/O writes to the first storage unit may be resumed.
摘要翻译:一种从具有主机的数据存储系统中的数据存储系统故障恢复的方法,所述数据存储系统具有主计算机,用第一存储控制器向第一存储单元写入数据,所述第一存储控制器将所述数据同步地镜像到第二存储单元,并且与第二存储控制器异步镜像 数据到第三存储单元。 该方法开始于检测与第一存储单元相关联的故障。 在检测到与第一存储单元相关联的错误或故障时,第一存储单元和第二存储单元之间的同步数据镜像关系被终止,并且主机将数据更新直接写入第二存储单元。 在校正与第一存储单元相关联的故障时,暂停从第二存储单元到第三存储单元的数据更新的异步镜像,并且数据的同步镜像沿相反方向从第二存储单元更新到第一存储单元 存储单元,开始。 当在第一存储单元和第二存储单元之间达到全双工状态时,可以重新建立与第一存储卷镜像数据到第二存储卷的同步PPRC关系,并且可以向第一存储单元写入主机I / O 恢复。
摘要:
A method of recovery from a data storage system failure in a data storage system having a host computer writing data updates to a local storage controller at a local site. The local controller is associated with a local storage device. The local storage controller is also configured to a synchronously copy the updates to a remote storage controller associated with a remote storage device at a remote site. In addition, the remote storage controller is configured to store a consistent point in time copy of the updates on a backup storage device. The consistent point in time copy is known as a consistency group. Upon detection of a failure associated with the local site, a determination is made whether a group of updates pending for storage on the backup storage device form an intact consistency group. If an intact consistency group has not formed, corrective action may be taken to create an intact consistency group. The recovery method further consists of synchronizing the remote storage device, initiating recovery operations and, upon recovery of the local site, resynchronization of the local storage device and the backup storage device to recovery consistency group without the need for full volume storage copies and while minimizing application downtime.
摘要:
A method, and a system for implementing the method, for implementing the method, for determining if a wire has been miswired in a network comprising service nodes and switch elements. The method includes the steps of: (1) transmitting a transmission stream in an outbound route, (where this transmission stream includes one or more service node fields for one or more service nodes, one or more switch element fields for one or more switch elements connected to the one or more service nodes, and a port field for each the switch element); (2) if the transmission stream is received on a port at a the switch element different than a the port field for the switch element indicated by the transmission stream, then setting an error indicator in the transmission stream; (3) transmitting the transmission stream back to the one or more service nodes in a return route, where the one or more service nodes determine from the error indicator a miswired condition between the receiving switch element and a previous switch element or service node along the outbound route. The one or more service nodes can record, store and tabulate the miswired condition and one or more additional miswired conditions. The transmission stream can store the one or more switch element fields for the one or more switch elements and the port fields for each the switch element separately for a path comprising the outbound route and a path comprising the return route.
摘要:
Adapters, which provide message communications capabilities in a multinode data processing network, are provided with a mechanism for indicating critical errors from which recovery may ultimately be possible. Error handling capabilities are incorporated which operate both globally and locally to insure, to the greatest extent possible, that applications running on the network are not prematurely terminated and that the node with the error affected adapter is not prematurely removed from its connectivity with the other nodes within it network group.
摘要:
A system and method for preventing deadlock in a multiprocessor computer system executing instructions requiring multiple resources. The system detects potential deadlock situations where a multi-resource instruction is blocked from obtaining one of the resources. A multi-resource instruction global lock is provided that can be held by at most one processor. Upon conflict detection, the processor attempts to acquire the multi-resource instruction global lock and, if successful, resumes resource acquisition. The use of a global lock serializes multiple resource requests and assures that the processor holding the lock can eventually acquire all required resources without deadlock with another processor. The preferred embodiment acquires the global lock on an exception basis to minimize the overhead impact. However, an alternate embodiment which uses the global lock in each multiple resource instruction could also be implemented. Synonym detection logic is provided to detect the situation where a conflict is caused by address resolution to a synonymous lock by the processor.
摘要:
A method, associated apparatus and program product for partitioning a plurality of interconnection elements among disjoint partitions of processors in a computer system so as to interconnect the processors within each of the disjoint partitions, and to isolate the processors in each interconnected partition from processors in the other partitions. The interconnection elements may be arranged into groups including node coupling elements and link coupling elements and in larger systems may include intermediate groups having intermediate coupling elements. The partitioning of the interconnection elements begins with the interconnection of processors in the largest disjoint partition and proceeds by connecting the successive largest processor partitions whose interconnection elements share a group with the interconnect elements used for the previously interconnected processor partitions until no such interconnect elements on shared groups remain, subsequently the process is repeated until all processors in the disjoint partitions are interconnected.