摘要:
A method and an apparatus is presented for updating flash memory that contains a write protected code, a first copy of rewritable recovery code, a second copy of rewritable recovery code, and a rewritable composite code. Each block of rewritable code contains a checksum code to detect if the block of code has been corrupted. If it is detected that the first copy of the recovery code is corrupted then the second copy of the recovery code is copied into the first copy of the recovery code. If it is detected the second copy of the recovery code is corrupted then the first copy of the recovery code is copied into the second copy of the recovery code. The recovery code is responsible for checking and updating the composite code. If it is detected the composite code is corrupted then a fresh copy of the composite code is obtained from a removable storage device or a network connection. The data processing system is booted by executing the write protected code, the first copy of the recovery code, and the composite code. There is a minimum of redundant code by only replicating two copies of the recovery code while, at the same time, guaranteeing both the integrity and the updateability of the flash memory.
摘要:
A method, system, and apparatus for reestablishing communications between a host and a service processor after the service processor has ceased to function correctly is provided. In one embodiment, the host exchanges heartbeat signals with the service processor. The heartbeat signals indicate that the service processor is active and functioning. In response to a failure to receive a heartbeat signal or in response to some other indication that the service processor is not performing correctly, the host causes a hard reset of the service processor. In addition, the service processor can detect a failure within itself and initiate a hard reset to itself. After the hard reset, the service processor returns to a monitoring mode without performing initial tests of the data processing system. Furthermore, the data processing system remains active and is not shut down during the hard reset of the service processor.
摘要:
To emulate multi-threaded processing in an operating system supporting only single-threaded processes and single-level interrupts, the processor timer is started with a selected time-out period during execution of a master code thread. Processing of the master code thread proceeds until the timer interrupt, at which time the operating system timer interrupt service routine (ISR) transfers execution control to a slave code thread or slave code thread component. The slave code thread or component is executed in its entirety, at which time the timer is reset and execution control is returned to the master code thread, where processing resumes at the point during which the timer interrupt was asserted. To minimize disruption of the master code thread execution, a maximum latency should be enforced on the slave code thread, which may be accomplished by breaking the slave code thread into multiple components. The timer ISR maintains an index of the predetermined starting points within the slave code thread(s) with a pointer identifying the next slave code thread component to be selected when the timer interrupt is asserted. Processing thus alternates between the master code thread and the slave code thread or components, with different slave code thread components being selected in round-robin fashion. The duty cycle between the master code thread and the slave code thread or components may be varied by selection of the time-out period and the maximum latency allowed to slave code thread processing.
摘要:
An apparatus and method for monitoring the state of a computer system running multiple operating systems shared by a partition manager is provided. A dedicated service processor monitors the individual run state condition of a plurality of processors running a plurality of operating systems. The service processor executes a routine to poll a memory location in each processor in the system to determine if the processor has entered an error loop with interrupts disabled. If any one of the plurality of processors are in an error loop, the service processor executes a routine to send a non-maskable interrupt to the looped processor so that the partition manager may regain control of the processor.