摘要:
In an implementation of latent error detection, memory regions that each correspond to a different processor element of a redundant processor system are scanned for latent processing errors maintained as erroneous data. The data maintained in the memory regions is compared to detect a latent processing error in a first memory region. The latent processing error is resolved by copying data from a second memory region into the first memory region where the data maintained in the second memory region is determined to be identical to data maintained in at least a third memory region.
摘要:
Method and system of determining whether a user program has made a system level call and thus whether the user program is uncooperative with fault tolerant operation. Some exemplary embodiments may be a processor-based method comprising providing information from a first processor to a second processor (the information indicating that a user program executed on the first processor has not made a system level call in a predetermined amount of time), and determining by the first processor, using information from the second processor, whether a duplicate copy of the user program substantially simultaneously executed in the second processor has made a system level call in the predetermined amount of time.
摘要:
A method and system of loosely lock-stepped non-deterministic processors. Some exemplary embodiments may be a processor-based method comprising executing fault tolerant copies of a user program, one copy of the user program executed in a first processor performing non-deterministic execution, and a duplicate copy of the user program executing in a second processor performing non-deterministic execution, with the executing in the first processor and second processor not in cycle-by-cycle lock-stepped.
摘要:
A method and system of copying a memory area between processor elements for lock-step execution. At least some of the illustrative embodiments may be a method comprising executing duplicate copies of a first program in a first processor of a first multiprocessor computer system and in a first processor of a second multiprocessor computer system (the executing substantially in lock-step), executing a second program in a second processor element of the first multiprocessor computer system (the first and second processors of the first multiprocessor computer system sharing an input/output (I/O) bridge), copying a memory area of the second program executing in the second processor element of the first multiprocessor computer system to a memory of a second processor element in the second multiprocessor computer system while the duplicate copies of the first program are executing in the first processor elements, and then executing duplicate copies of the second program in the second processors in lock-step.
摘要:
Performance data access is described. In an embodiment, events are processed with non-synchronized processor elements of a logical processor in a redundant processor system. Performance data associated with execution of the processor events is stored in one or more accumulators corresponding to a respective processor element. The performance data from each of the non-synchronized processor elements is exchanged via a logical synchronization unit such that each processor element includes the performance data from each of the processor elements. Each processor element then conforms the performance data to generate synchronized performance data which is then communicated to a performance monitoring application that requests the performance data from the logical processor.
摘要:
In a redundant-processor computing device, an error handling method comprises detecting equivalent disparity among processor elements of the computing device operating and responding to the detected equivalent disparity by evaluating secondary considerations of processor fidelity.
摘要:
A method and system of copying memory from a source processor to a target processor by duplicating memory writes. At least some of the exemplary embodiments may be a method comprising stopping execution of a user program on a target processor (the target processor coupled to a first memory), continuing to execute a duplicate copy of the user program on a source processor (the source processor coupled to a second memory and generating writes to the second memory), duplicating memory writes of the source processor and duplicating writes by input/output adapters to create a stream of duplicate memory writes, and applying the duplicated memory writes to the first memory.
摘要:
A plurality of redundant, loosely-coupled processor elements are operational as a logical processor. A logic detects a halt condition of the logical processor and, in response to the halt condition, reintegrates and commences operation in less than all of the processor elements leaving at least one processor element nonoperational. The logic also buffers data from the nonoperational processor element in the reloaded operational processor elements and writes the buffered data to storage for analysis.