摘要:
Administering a system dump on a redundant node controller including detecting a communications failure between a system controller and the redundant node controller; generating a unique identifier for the communications failure; instructing a primary node controller to provoke a system dump on the redundant node controller; provoking the system dump on the redundant node controller including suspending a processor of the redundant node controller and storing during the suspension of the processor the unique identifier for the communications failure and an instruction to execute the system dump on the redundant node controller; releasing the processor of the redundant node controller from suspension; in response to releasing the processor from suspension, identifying the unique identifier for the communications failure and the instruction to execute the system dump; and executing the system dump including associating the system dump with the unique identifier.
摘要:
Administering correlated error logs in a computer system having a system controller and one or more redundant node controllers including providing by the system controller to a redundant node controller a unique identifier for error logs; detecting by the system controller a communications failure between the system controller and the redundant node controller; in response to detecting the communications failure, generating by the system controller a system controller error log for the communications failure including the unique identifier; detecting by the redundant node controller the communications failure between the system controller and the redundant node controller; and in response to detecting the communications failure, generating by the redundant node controller a redundant node controller error log for the communications failure including the unique identifier.
摘要:
Recovery of a redundant node controller in a computer system including determining a loss of a heartbeat for a predefined period of time between a system controller and the redundant node controller; in response to determining the loss of the heartbeat for the predefined period of time, checking network connectivity between the system controller and the redundant node controller; if there is network connectivity between the system controller and the redundant node controller, determining whether an application on the redundant node controller is running; and if an application on the redundant node controller is running, resetting the redundant node controller through a primary node controller.
摘要:
Recovery of a redundant node controller in a computer system including determining a loss of a heartbeat for a predefined period of time between a system controller and the redundant node controller; in response to determining the loss of the heartbeat for the predefined period of time, checking network connectivity between the system controller and the redundant node controller; if there is network connectivity between the system controller and the redundant node controller, determining whether an application on the redundant node controller is running; and if an application on the redundant node controller is running, resetting the redundant node controller through a primary node controller.
摘要:
Administering a system dump on a redundant node controller including detecting a communications failure between a system controller and the redundant node controller; generating a unique identifier for the communications failure; instructing a primary node controller to provoke a system dump on the redundant node controller; provoking the system dump on the redundant node controller including suspending a processor of the redundant node controller and storing during the suspension of the processor the unique identifier for the communications failure and an instruction to execute the system dump on the redundant node controller; releasing the processor of the redundant node controller from suspension; in response to releasing the processor from suspension, identifying the unique identifier for the communications failure and the instruction to execute the system dump; and executing the system dump including associating the system dump with the unique identifier.
摘要:
Administering correlated error logs in a computer system having a system controller and one or more redundant node controllers including providing by the system controller to a redundant node controller a unique identifier for error logs; detecting by the system controller a communications failure between the system controller and the redundant node controller; in response to detecting the communications failure, generating by the system controller a system controller error log for the communications failure including the unique identifier; detecting by the redundant node controller the communications failure between the system controller and the redundant node controller; and in response to detecting the communications failure, generating by the redundant node controller a redundant node controller error log for the communications failure including the unique identifier.
摘要:
A method for correcting a formatting error in a flash memory is disclosed. An error in a first formatting of a first flash memory is discovered, and a second formatting is extracted from a second flash memory storing second data. The erroneous first formatting is replaced with a modification of the second formatting, and first data is stored in the first flash memory with the modification of the second formatting. The first data is different from the second data.
摘要:
A method for correcting a formatting error in a boot sector of a hard disk drive is disclosed. An error in a first formatting of a first hard disk drive is discovered, and a second formatting is extracted from a second hard disk drive storing second data. The erroneous first formatting is replaced with a modification of the second formatting, and first data is stored in the first hard disk drive with the modification of the second formatting. The first data is different from the second data.
摘要:
A method, system, and computer usable program product for updating firmware without disrupting service are provided in the illustrative embodiments. An updated firmware code is sent to a first firmware component and a second firmware component. The first firmware component is a primary firmware component and the second firmware component is a backup firmware component in a redundant firmware configuration. The updated firmware code is installed in second firmware component. The updated firmware code is activated in a third firmware component. The third firmware component is in communication with the first firmware component. A fail-over from the first firmware component to the second firmware component is performed such that a user communicating with the data processing system and receiving a service using the first firmware component continues to receive the service using the second firmware component without a disruption in the service.
摘要:
A method, computer program product, and system for the staged integration of a remote entity and the simultaneous publishing of services is provided. The integration of the distributed remote entities is broken into five stages, with appropriate events published after each stage. Each of the five stages is initiated only if the previous stage completed successfully. The first stage is the initiate discovery phase. The first event is the discovery start event. The second stage is the discovery completed phase. The second event is the discovery completed event. The third stage is the basic software services verified phase. The third event is the basic software verification completed event. The fourth stage is the basic hardware services verified phase. The fourth event is the basic hardware verification completed event. The fifth stage is the extended hardware services verified phase. The fifth event is the full integration of disturbed entity event.