摘要:
According to one embodiment, a method comprises detecting a loss of lockstep (LOL) for a processor module. The method further comprises determining a type of LOL that is detected, and, based at least in part on the determined type of LOL, determining a responsive action to take for the LOL. According to one embodiment, a method comprises detecting a loss of lockstep (LOL) for a processor module. The method further comprises using information identifying at least one of type of the detected LOL and source of the detected LOL to determine a responsive action to take for the LOL.
摘要:
According to at least one embodiment, a method comprises detecting loss of lockstep for a pair of processors. The method further comprises triggering, by firmware, an operating system to idle the processors, and recovering, by the firmware, lockstep between the pair of processors. After lockstep is recovered between the pair of processors, the method further comprises triggering, by the firmware, the operating system to recognize the processors as being available for receiving instructions.
摘要:
According to one embodiment, a method comprises, responsive to detection of loss of lockstep (LOL) for a processor module in a system, setting status information stored to the system for the processor module to indicate that the processor module has an error. The method further comprises reestablishing lockstep for the processor module without shutting down the system's operating system, and updating the status information for the processor module to indicate that the processor module no longer has the error. The method further comprises causing the system's operating system to recognize that the processor module having its lockstep reestablished is available for processing.
摘要:
According to at least one embodiment, a method comprises detecting loss of lockstep for a pair of processors. The method further comprises triggering, by firmware, an operating system to idle the processors, and recovering, by the firmware, lockstep between the pair of processors. After lockstep is recovered between the pair of processors, the method further comprises triggering, by the firmware, the operating system to recognize the processors as being available for receiving instructions.
摘要:
According to one embodiment, a method comprises, responsive to detection of loss of lockstep (LOL) for a processor module in a system, setting status information stored to the system for the processor module to indicate that the processor module has an error. The method further comprises reestablishing lockstep for the processor module without shutting down the system's operating system, and updating the status information for the processor module to indicate that the processor module no longer has the error. The method further comprises causing the system's operating system to recognize that the processor module having its lockstep reestablished is available for processing.
摘要:
One embodiment of the invention is a method for resetting a partition of a multiple partition system, wherein the partition comprises a plurality of processors, the method comprising executing, by one processor of the plurality of processors, reset code from firmware, building a list of reset register addresses associated with the plurality of processors, sending an interrupt to the other processors of the plurality of processors, resetting the other processors by writing a reset code to their associated reset registers, and resetting the one processor by writing to its associated reset register.
摘要:
A system comprises a plurality of processors, and data storage storing information that assigns a role of boot processor to one of the plurality of processors and assigns a role of spare processor to another of the plurality of processors. The system further comprises logic operable, responsive to detecting loss of lockstep for the boot processor, for transferring, during system runtime, the role of boot processor to the spare processor.
摘要:
Embodiments include methods, apparatus, and systems for containing machine check events in a virtual partition. One embodiment is a method of software execution. The method divides a hard partition into first and second virtual partitions and attempts to correct an error in a firmware layer of the first virtual partition. If the error is not correctable, then the method reboots the first virtual partition without disrupting hardware resources in the second virtual partition
摘要:
According to one embodiment, a method comprises system firmware instructing a system's operating system to idle a processor, and responsive to the instructing, the operating system idling the processor and returning control over the processor to the system firmware. According to one embodiment, a method comprises detecting loss of lockstep (LOL) for a processor module in a system, and responsive to the detecting LOL for the processor module, system firmware instructing an operating system to idle the processor module.
摘要:
A system comprises a processor module that supports lockstep mode of operation. The system further comprises non-volatile data storage having stored thereto configuration information specifying whether the processor module is desired to operate in lockstep mode. A method comprises storing configuration information to non-volatile data storage of a system, wherein the configuration information specifies whether lockstep mode of operation is desired to be enabled or disabled for a processor module of the system. The method further comprises causing, by the system, the processor module to have its lockstep mode enabled or disabled as specified by the configuration information.