摘要:
For preventing data loss in storage systems a detection is made that a storage device in a plurality of storage devices is experiencing a malfunction. The type of malfunction is determined. A SMART rebuilding technique, a normal building technique, a data migration technique, or a user data backup technique is selected to preserve the data in the storage device based on the determined type of the malfunction. The selected technique is performed on the storage device.
摘要:
For preventing data loss in storage systems, a detection is made that a storage device in a plurality of storage devices is experiencing a malfunction. The type of malfunction is determined. A SMART rebuilding technique, a normal building technique, a data migration technique, or a user data backup technique is selected to preserve the data in the storage device based on the determined type of the malfunction. The selected technique is performed on the storage device.
摘要:
An apparatus, system, and method are disclosed for improving system reliability by managing switched drive networks. An off-network pool of storage devices is logically isolated from an array of storage devices. A detection module detects a failed storage device. A repositioning module logically repositions storage devices that are not performing operations. A rebuilding module may rebuild data from the failed storage device.
摘要:
An apparatus for parity data management receives a write command and write data from a computing device. The apparatus also builds a parity control structure corresponding to updating a redundant disk array with the write data and stores the parity control structure in a persistent memory buffer of the computing device. The apparatus also updates the redundant disk array with the write data in accordance with a parity control map and restores the RAID controller parity map from the parity control structure as part of a data recovery operation if updating the redundant disk array with the write data is interrupted by a RAID controller failure resulting in a loss of the RAID controller parity map. In certain embodiments, the parity control structure is a RAID controller parity map.
摘要:
An apparatus, system, and method quickly backs up data in an emergency situation and reduces battery backup dependence. The apparatus may include a backup module and a dedicated computer readable storage device. The backup module interfaces with system memory and selectively transmits modified data to the storage device in response to a detected system failure. The dedicated storage device stores the modified data around the outer edge of a hard disk in order to increase write performance. The system may include the backup module, the storage device, a plurality of client devices, and a plurality of storage devices. The method includes storing modified and unmodified data, detecting a system failure, and transmitting modified data stored in a memory module to a dedicated computer readable backup device. Upon rebooting the device, the method may include restoring the modified data to the system memory and destaging the modified data to the plurality of storage devices.
摘要:
Provided is a method, system, and program for processing Input/Output (I/O) requests to a storage network including at least one storage device and at least two adaptors, wherein each adaptor is capable of communicating I/O requests to the at least one storage device. An error is detected in a system including a first adaptor, wherein the first adaptor is capable of communicating on the network after the error is detected. In response to detecting the error, a monitoring state is initiated to monitor I/O requests transmitted through a second adaptor. In response to receiving an I/O request, an I/O delay timer is started that is less than a system timeout period. After the system timeout period the error recovery process in the system including the first adaptor would complete. A reset request is sent to the first adaptor in response to detecting an expiration of one started I/O delay timer.
摘要:
Provided are a method, system, and article of manufacture for using device status information to takeover control of devices assigned to a node. A first processing unit communicates with a second processing unit. The first processing unit uses a first device accessible to both the first and second processing units and the second processing unit uses a second device accessible to both the first and second processing units. The first processing unit receives status on the second device from the first device indicating whether the second device is available or unavailable. The first processing unit detects a failure of the second processing unit and determines from the received status on the second device whether the first device is available in response to detecting the failure of the second processing unit. The first processing unit configures the second device for use by the first processing unit in response to determining that the received status on the second device indicates that the second device is available and in response to detecting the failure.
摘要:
Provided are a method, system, and article of manufacture for using device status information to takeover control of devices assigned to a node. A first processing unit communicates with a second processing unit. The first processing unit uses a first device accessible to both the first and second processing units and the second processing unit uses a second device accessible to both the first and second processing units. The first processing unit receives status on the second device from the first device indicating whether the second device is available or unavailable. The first processing unit detects a failure of the second processing unit and determines from the received status on the second device whether the first device is available in response to detecting the failure of the second processing unit. The first processing unit configures the second device for use by the first processing unit in response to determining that the received status on the second device indicates that the second device is available and in response to detecting the failure.
摘要:
Provided are a method, system, and article of manufacture for error checking addressable blocks in storage. Addressable blocks of data are stored in a storage in stripes, wherein each stripe includes a plurality of data blocks for one of the addressable blocks and at least one checksum block including checksum data derived from the data blocks for the addressable block. A write request is received to modify data in one of the addressable blocks. The write and updating the checksum are performed in the stripe having the modified addressable block. An indication is made to perform an error checking operation on the stripe for the modified addressable block in response to the write request, wherein the error checking operation reads the data blocks and the checksum in the stripe to determine if the checksum data is accurate. An error handling operation is initiated in response to determining that the checksum data is not accurate.
摘要:
Provided are a method, system, and article of manufacture for error checking addressable blocks in storage. Addressable blocks of data are stored in a storage in stripes, wherein each stripe includes a plurality of data blocks for one of the addressable blocks and at least one checksum block including checksum data derived from the data blocks for the addressable block. A write request is received to modify data in one of the addressable blocks. The write and updating the checksum are performed in the stripe having the modified addressable block. An indication is made to perform an error checking operation on the stripe for the modified addressable block in response to the write request, wherein the error checking operation reads the data blocks and the checksum in the stripe to determine if the checksum data is accurate. An error handling operation is initiated in response to determining that the checksum data is not accurate.