摘要:
A method, system, and computer usable program product for updating firmware without disrupting service are provided in the illustrative embodiments. An updated firmware code is sent to a first firmware component and a second firmware component. The first firmware component is a primary firmware component and the second firmware component is a backup firmware component in a redundant firmware configuration. The updated firmware code is installed in second firmware component. The updated firmware code is activated in a third firmware component. The third firmware component is in communication with the first firmware component. A fail-over from the first firmware component to the second firmware component is performed such that a user communicating with the data processing system and receiving a service using the first firmware component continues to receive the service using the second firmware component without a disruption in the service.
摘要:
Disclosed is a computer implemented method and apparatus for establishing a redundant channel from an application to a peer data processing system. The interrupt-driven hot standby program receives, through the operation of a data processing system, a communication channel status corresponding to an application. The application has a first channel using local access across a first physical conduit to a first switch. In addition the communication channel status is, in part, an interrupt. The interrupt-driven hot standby program determines whether the redundant channel is present. The redundant channel is configured to use a second physical conduit distinct from the first physical conduit for traffic of the application. Responding to a determination that the redundant channel is present, the interrupt-driven hot standby program determines whether the redundant channel is configured to use the second physical conduit as local access to a redundant switch, wherein the redundant switch is not the first switch. The interrupt-driven hot standby program responds to a determination that the redundant channel is configured to use the second physical conduit by updating a communication channel list to include at least one attribute of the redundant channel, wherein the communication channel list is resident in the data processing system.
摘要:
Recovery of a redundant node controller in a computer system including determining a loss of a heartbeat for a predefined period of time between a system controller and the redundant node controller; in response to determining the loss of the heartbeat for the predefined period of time, checking network connectivity between the system controller and the redundant node controller; if there is network connectivity between the system controller and the redundant node controller, determining whether an application on the redundant node controller is running; and if an application on the redundant node controller is running, resetting the redundant node controller through a primary node controller.
摘要:
Recovery of a redundant node controller in a computer system including determining a loss of a heartbeat for a predefined period of time between a system controller and the redundant node controller; in response to determining the loss of the heartbeat for the predefined period of time, checking network connectivity between the system controller and the redundant node controller; if there is network connectivity between the system controller and the redundant node controller, determining whether an application on the redundant node controller is running; and if an application on the redundant node controller is running, resetting the redundant node controller through a primary node controller.
摘要:
A method for correcting a formatting error in a flash memory is disclosed. An error in a first formatting of a first flash memory is discovered, and a second formatting is extracted from a second flash memory storing second data. The erroneous first formatting is replaced with a modification of the second formatting, and first data is stored in the first flash memory with the modification of the second formatting. The first data is different from the second data.
摘要:
A method, computer program product, and system for the staged integration of a remote entity and the simultaneous publishing of services is provided. The integration of the distributed remote entities is broken into five stages, with appropriate events published after each stage. Each of the five stages is initiated only if the previous stage completed successfully. The first stage is the initiate discovery phase. The first event is the discovery start event. The second stage is the discovery completed phase. The second event is the discovery completed event. The third stage is the basic software services verified phase. The third event is the basic software verification completed event. The fourth stage is the basic hardware services verified phase. The fourth event is the basic hardware verification completed event. The fifth stage is the extended hardware services verified phase. The fifth event is the full integration of disturbed entity event.
摘要:
A system, method, and product are disclosed in a data processing system for serializing hardware reset requests in a software communication request queue in a processor card. The processor card processes software communication requests utilizing the queue in a serial order. A hardware reset request is received by the processor card and put in the queue. The hardware reset request is processed from the queue in the serial order with all requests from the queue that are currently being serviced have completed being serviced.
摘要:
A method for enabling a Node Controller (NC), which claims a duplicate or invalid service processor Node Controller Identification (NCID) in a distributed service processor system, to be integrated into the system includes reading an NCID by the NC after the NC is booted, saving the NCID into a non-volatile storage and broadcasting an NC Present Message (NPM) to a System Controller (SC) repeatedly until the SC initiates communication, updating the NCID for the NC in the non-volatile storage when the NC receives an NCID change message from the SC and rating any future NPM as a new NCID, and checking a record of a new NC when the SC receives the NPM from the NC. If the SC has a record of a recorded NC with the same NCID as the new NC, then the SC checks its role as a primary SC. If the SC does not have the record of the recorded NC with the same NCID as the new NC, then the SC checks validity of the NCID.
摘要:
Administering a system dump on a redundant node controller including detecting a communications failure between a system controller and the redundant node controller; generating a unique identifier for the communications failure; instructing a primary node controller to provoke a system dump on the redundant node controller; provoking the system dump on the redundant node controller including suspending a processor of the redundant node controller and storing during the suspension of the processor the unique identifier for the communications failure and an instruction to execute the system dump on the redundant node controller; releasing the processor of the redundant node controller from suspension; in response to releasing the processor from suspension, identifying the unique identifier for the communications failure and the instruction to execute the system dump; and executing the system dump including associating the system dump with the unique identifier.
摘要:
Administering correlated error logs in a computer system having a system controller and one or more redundant node controllers including providing by the system controller to a redundant node controller a unique identifier for error logs; detecting by the system controller a communications failure between the system controller and the redundant node controller; in response to detecting the communications failure, generating by the system controller a system controller error log for the communications failure including the unique identifier; detecting by the redundant node controller the communications failure between the system controller and the redundant node controller; and in response to detecting the communications failure, generating by the redundant node controller a redundant node controller error log for the communications failure including the unique identifier.