摘要:
A method for enabling a Node Controller (NC), which claims a duplicate or invalid service processor Node Controller Identification (NCID) in a distributed service processor system, to be integrated into the system includes reading an NCID by the NC after the NC is booted, saving the NCID into a non-volatile storage and broadcasting an NC Present Message (NPM) to a System Controller (SC) repeatedly until the SC initiates communication, updating the NCID for the NC in the non-volatile storage when the NC receives an NCID change message from the SC and rating any future NPM as a new NCID, and checking a record of a new NC when the SC receives the NPM from the NC. If the SC has a record of a recorded NC with the same NCID as the new NC, then the SC checks its role as a primary SC. If the SC does not have the record of the recorded NC with the same NCID as the new NC, then the SC checks validity of the NCID.
摘要:
An architecture for error log processing is provided. Each error log is given a defined priority and mapped to an error recovery procedure (ERP) to be run if the log is seen. The system has a plurality of software layers to process the errors. Each software layer processes the error independently. Errors are reported to a higher software stack when error recovery fails from the lower stack ERPs and recovery is non-transparent. If the system host identified for error processing fails, the control of the ERP is transferred during the failover process. Non-obvious failed component isolating ERPs are grouped to be run together to assist in isolating the failed component. Prioritization of the error systems may be based on a plurality of criteria. ERPs are assigned to run within a particular software stack.
摘要:
Provided is a method, system, and program for processing Input/Output (I/O) requests to a storage network including at least one storage device and at least two adaptors, wherein each adaptor is capable of communicating I/O requests to the at least one storage device. An error is detected in a system including a first adaptor, wherein the first adaptor is capable of communicating on the network after the error is detected. In response to detecting the error, a master switch timer is started that is less than a system timeout period if the first adaptor is the master. An error recovery procedure in the system including the first adaptor would be initiated after the system timeout period has expired. An operation is initiated to designate another adaptor in the storage network as the master if the first adaptor is the master in response to detecting an expiration of the master switch timer.
摘要:
Provided is a method, system, and program for processing Input/Output (I/O) requests to a storage network including at least one storage device and at least two adaptors, wherein each adaptor is capable of communicating I/O requests to the at least one storage device. An error is detected in a system including a first adaptor, wherein the first adaptor is capable of communicating on the network after the error is detected. In response to detecting the error, a master switch timer is started that is less than a system timeout period if the first adaptor is the master. An error recovery procedure in the system including the first adaptor would be initiated after the system timeout period has expired. An operation is initiated to designate another adaptor in the storage network as the master if the first adaptor is the master in response to detecting an expiration of the master switch timer.
摘要:
A method for enabling a Node Controller (NC), which claims a duplicate or invalid service processor Node Controller Identification (NCID) in a distributed service processor system, to be integrated into the system includes reading an NCID by the NC after the NC is booted, saving the NCID into a non-volatile storage and broadcasting an NC Present Message (NPM) to a Service Processor (SC) repeatedly until the SC initiates communication, updating the NCID for the NC in the non-volatile storage when the NC receives an NCID change message from the SC and rating any future NPM as a new NCID, and checking a record of an new NC in the non-volatile storage when the SC receives the NPM from the NC. If the SC has a record of a recorded NC with the same NCID as the new NC, then the SC checks its role as a primary SC. If the SC does not have the record of the recorded NC with the same NCID as the new NC, then the SC checks validity of the NCID.
摘要:
An apparatus for parity data management receives a write command and write data from a computing device. The apparatus also builds a parity control structure corresponding to updating a redundant disk array with the write data and stores the parity control structure in a persistent memory buffer of the computing device. The apparatus also updates the redundant disk array with the write data in accordance with a parity control map and restores the RAID controller parity map from the parity control structure as part of a data recovery operation if updating the redundant disk array with the write data is interrupted by a RAID controller failure resulting in a loss of the RAID controller parity map. In certain embodiments, the parity control structure is a RAID controller parity map.
摘要:
Provided is a method, system, and program for processing Input/Output (I/O) requests to a storage network including at least one storage device and at least two adaptors, wherein each adaptor is capable of communicating I/O requests to the at least one storage device. An error is detected in a system including a first adaptor, wherein the first adaptor is capable of communicating on the network after the error is detected. In response to detecting the error, a master switch timer is started that is less than a system timeout period if the first adaptor is the master. An error recovery procedure in the system including the first adaptor would be initiated after the system timeout period has expired. An operation is initiated to designate another adaptor in the storage network as the master if the first adaptor is the master in response to detecting an expiration of the master switch timer.
摘要:
Provided is a method for processing Input/Output (I/O) requests to a storage network including at least one storage device and at least two adaptors, wherein each adaptor is capable of communicating I/O requests to the at least one storage device. An error is detected in a system including a first adaptor, wherein the first adaptor is capable of communicating on the network after the error is detected. In response to detecting the error, a master switch timer is started that is less than a system timeout period if the first adaptor is the master. An error recovery procedure in the system including the first adaptor would be initiated after the system timeout period has expired. An operation is initiated to designate another adaptor in the storage network as the master if the first adaptor is the master in response to detecting an expiration of the master switch timer.
摘要:
Provided is a method, system, and program for processing Input/Output (I/O) requests to a storage network including at least one storage device and at least two adaptors, wherein each adaptor is capable of communicating I/O requests to the at least one storage device. An error is detected in a system including a first adaptor, wherein the first adaptor is capable of communicating on the network after the error is detected. In response to detecting the error, a master switch timer is started that is less than a system timeout period if the first adaptor is the master. An error recovery procedure in the system including the first adaptor would be initiated after the system timeout period has expired. An operation is initiated to designate another adaptor in the storage network as the master if the first adaptor is the master in response to detecting an expiration of the master switch timer.
摘要:
Provided are a method, system, and article of manufacture for detecting data integrity. An indicator is written to invalidate a data block that is capable of being stored in a plurality of sectors of a storage device, wherein the indicator is written to the storage device in at least one sector that is not included in the plurality of sectors. A writing of entire contents of the data block to the plurality of sectors of the storage device is initiated, in response to the writing of the indicator. The indicator is updated to validate the data block, in response to a completion of the writing of the entire contents of the data block to the plurality of sectors of the storage device.