摘要:
A method for automatically diagnosing faults a data storage system. The system includes a plurality of enclosures each having: a primary port; an expansion port; a plurality of disk drives; and a link control card coupled to the primary port and to the expansion port and the plurality of disk drives. The link control card includes a cut through switch having: disk drive port error counters for counting at ports of the plurality of disk drives; a primary port error counter for counting cumulative errors at the primary port, and an expansion port error counter for counting cumulative errors at the expansion port. The primary ports and expansion ports are serially interconnected to the storage processor through a fiber channel loop. The method sequentially reads counters in each one of the enclosures to determine whether errors counted in any one of such counters exceeds a predetermined threshold over a predetermined period of time.
摘要:
Hardware faults in data storage systems are diagnosed. User I/O errors are received. Disk drive port error counters, primary port error counters, and expansion port error counters are read. A user I/O error threshold is modified based on the error counter readings. Depending on the type of errors counted, the user I/O error threshold may be increased or decreased. Once a first quantity of user I/O errors exceeds the modified user I/O error threshold, a faulty component is identified.
摘要:
Storage stability is managed. It is detected that a disk drive is requesting to be taken offline. The disk drive is begun to be treated as being in a probation state. If within an acceptable period of time the disk drive requests to be put back online, treatment of the disk drive as being in a probation state is stopped, and only any portions of the disk drive data that were the subject of write requests involving the disk drive while the disk drive was being treated as being in a probation state are rebuilt.
摘要:
System stability is managed. It is determined that a data storage system is responsive to an enclosure that is unstable. Based on the determination, the enclosure is temporarily prevented from being added to the data storage system.
摘要:
Loop interface failure is managed. A first device on a loop is identified as a potential cause of the loop interface failure. The loop is tested with the first device functionally removed from the loop. Depending on the results of the test, it is determined that the first device is not the cause of the loop interface failure and a second device on the loop is identified as the cause of the loop interface failure.
摘要:
A method, a system and a computer program product for upgrading firmware is disclosed. In one embodiment data storage is managed in a data storage system comprising a first enclosure having a first storage processor and a first power supply. A firmware upgrade is saved in the first storage processor. The firmware upgrade in the first storage processor and firmware in the first power supply are compared. The firmware upgrade is downloaded to the first power supply in response to the comparison determining a difference between the firmware upgrade in the first storage processor and the firmware in the first power supply. The firmware is upgraded in the first power supply with the firmware upgrade.
摘要:
A method is used in managing loop interface instability. It is determined that a loop has excessive intermittent failures. It is determined, based on whether the intermittent failures are detectable on another loop, whether the cause of the excessive intermittent failures is within a specific category of components. A search procedure is executed that is directed to the specific category of components, to isolate the cause of the excessive intermittent failures.
摘要:
System stability is managed. It is determined that a data storage system is responsive to an enclosure that is unstable. Based on the determination, the enclosure is temporarily prevented from being added to the data storage system.
摘要:
A system sets a disk access inhibitor flag whenever a disk drive is placed by the system in an inaccessible condition. The drive operates to set a bit therein when the drive has placed itself in a by-pass condition. During each polling event, the system determines: (1) whether the bit has been set; and (2) whether the disk access inhibitor flag has been set. If the bit has been set and such disk access inhibitor flag has been set, the system maintains the drive in the inaccessible condition; otherwise, the drive is accessible to the system. If, during a polling event, the bit has been set but that drive has not had a bit set during a relatively long period of time, the system maintains the drive accessible to the system unless the drive sets the bit during a subsequent predetermined wait period, after which the system sets the flagdisk access inhibitor flag and places the drive in the inaccessible condition.
摘要:
Storage stability is managed. It is detected that a disk drive is requesting to be taken offline. The disk drive is begun to be treated as being in a probation state. If within an acceptable period of time the disk drive requests to be put back online, treatment of the disk drive as being in a probation state is stopped, and only any portions of the disk drive data that were the subject of write requests involving the disk drive while the disk drive was being treated as being in a probation state are rebuilt.