摘要:
Hardware error processing is undertaken to analyze the source of the error and to preserve sufficient information to allow later software error processing. The hardware error processing also allows, for certain errors, complete recovery without interruption of the sequence of instruction execution.
摘要:
A fault tolerant computer system has a central processing system which includes at least one set of data pathways, and executes a series of data processing instructions including the transfer of messages along the plurality of data pathways. At least one set of transaction data storage devices are coupled to the data pathways for storing a predetermined number of successive messages transferred most recently on the data pathways. Error checking devices are included for detecting the presence of errors in the central processing system. Error storage devices are coupled to the transaction data storage devices and the error checking devices for causing the transaction data storage devices to cease storing additional messages in response to the detection of errors by the error checking device.
摘要:
A fault tolerant computer system having a first processing system which includes a first data processor for executing a series of data processing instructions. A first data output terminal outputs data from the first processing system. A second processing system, substantially identical to the first processing system, operates independently from the first processing system. The second processing system includes a second data processor for executing the series of data processing instructions in the same sequence as the first data processor. It also includes a second data output terminal for outputting data from the second processing system. A synchronizing device is coupled to the first and second data processors for maintaining the execution of the series of data processing instructions by the first and second processing systems in synchronism. Fault detection devices are coupled to the first and second data output terminals for comparing the data output from the first processing system with the data output from the second processing system. The fault detection devices identify the presence of an error when the data output from the first processing system at the first output terminal is different from the data output from the second processing system at the second output terminal.
摘要:
A computer system configured to provide fault tolerance includes a first host system and a second host system. The first host system is programmed to monitor a number of portions of memory of the first host system that have been modified by a guest running on the first host system and, upon determining that the number of portions exceeds a threshold level, determine that a checkpoint needs to be created. Upon determining that the checkpoint needs to be created, operation of the guest is paused and checkpoint data is generated. After generating the checkpoint data, operation of the guest is resumed while the checkpoint data is transmitted to the second host system.
摘要:
A symmetric multiprocessing fault-tolerant computer system controls memory access in a symmetric multiprocessing computer system. To do so, virtual page structures are created, where the virtual page structures reflect physical page access privileges to shared memory for processors in a symmetric multiprocessing computer system. Access to shared memory is controlled based on physical page access privileges reflected in the virtual paging structures to coordinate deterministic shared memory access between processors in the symmetric multiprocessing computer system. A symmetric multiprocessing fault-tolerant computer system may use duplication or continuous replay.
摘要:
A fault tolerant/fault resilient computer system includes at least two compute elements connected to at least one controller. Each compute element has clocks that operate asynchronously to clocks of the other compute elements. The compute elements operate in a first mode in which the compute elements each execute a first stream of instructions in emulated clock lockstep, and in a second mode in which the compute elements each execute a second stream of instructions in instruction lockstep. Each compute element may be a multi-processor compute element.
摘要:
Data transfer to computing elements is synchronized in a computer system that includes the computing elements and controllers that provide data from data sources to the computing elements. A request for data made by a computing element is intercepted and transmitted to the controllers. At least a first controller responds by transmitting requested data to the computing element and by indicating how a second controller will respond to the intercepted request.
摘要:
A dual processor computer system includes a first processing system having a central processing unit which executes a series of data processing instructions, a data bus system for transferring data to and from the first central processing unit, a memory unit coupled to the first central processing unit, and a cross-link communications element for transferring data into and out of the first processing system. A similarly configured second processing system, operating independently of the first processing system, is also provided. The cross-link communications element associated with the second processing system is coupled to the cross-link communication element of the first processing system, for transferring data into the second processing system from the first processing system and for transferring data into the first processing system from the second computer system.
摘要:
A memory for storing data in a computer system. Integrity of data transferred to or from a memory array is monitored by transferring two sets of EDC or ECC data corresponding to a longword of data between the memory array and two separate memory controllers. The probability of an undetected error is very low because the two sets of EDC or ECC data are compared to ensure that they match. The number of lines and pins used is minimized by multiplexing the EDC or ECC data with address signals and cycle type signals. The address and cycle type signals are placed on the time division multiplexed bidirectional lines at the beginning of a memory transfer cycle, and the EDC or ECC data is placed on these time division multiplexed lines at times when a longword of data is being transferred on a set of bidirectional data lines.