Fault-tolerant computer system with online recovery and reintegration of redundant components
    3.
    发明授权
    Fault-tolerant computer system with online recovery and reintegration of redundant components 失效
    具有在线恢复和重新集成冗余组件的容错计算机系统

    公开(公告)号:US06263452B1

    公开(公告)日:2001-07-17

    申请号:US09226960

    申请日:1999-01-08

    IPC分类号: G06F1118

    摘要: A computer system in a fault-tolerant configuration employees multiple identical CPUs executing the same instruction stream, with multiple, identical memory modules in the address space of the CPUs storing duplicates of the same data. The system detects faults in the CPUs and memory modules, and places a faulty unit offline while continuing to operate using the good units. The faulty unit can be replaced and reintegrated into the system without shutdown. The multiple CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously; interrupts can be synchronized by ensuring that all CPUs implement the interrupt at the same point in their instruction stream. Memory references via the separate CPU-to-memory busses are voted at the three separate ports of each of the memory modules. I/O functions are implemented using two identical I/O busses, each of which is separately coupled to only one of the memory modules. A number of I/O processors are coupled to both I/O busses. I/O devices are accessed through a pair of identical (redundant) I/O processors, but only one is designated to actively control a given device; in case of failure of one I/O processor, however, an I/O device can be accessed by the other one without system shutdown.

    摘要翻译: 在容错配置中的计算机系统雇用执行相同指令流的多个相同的CPU,在存储相同数据的副本的CPU的地址空间中具有多个相同的存储器模块。 系统检测CPU和内存模块中的故障,并使故障单元脱机,并继续使用良好的单元进行操作。 故障单元可以更换并重新集成到系统中,无需关闭。 多个CPU松动地同步,如通过检测诸如内存引用的事件,并阻止其他任何CPU之前的事件,直到所有同时执行该功能; 可以通过确保所有CPU在其指令流中的相同点实现中断来实现中断。 通过单独的CPU到内存总线的内存引用在每个内存模块的三个独立端口上进行投票。 使用两个相同的I / O总线实现I / O功能,每个总线单独耦合到只有一个存储器模块。 许多I / O处理器耦合到两个I / O总线。 I / O设备通过一对相同(冗余)I / O处理器访问,但只有一个被指定为主动控制给定的设备; 然而,在一个I / O处理器发生故障的情况下,I / O设备可以被另一个I / O设备访问,而不需要系统关闭。

    Fault-tolerant computer system with auto-restart after power-fall
    7.
    发明授权
    Fault-tolerant computer system with auto-restart after power-fall 失效
    掉电后自动重启的容错计算机系统

    公开(公告)号:US5317752A

    公开(公告)日:1994-05-31

    申请号:US977734

    申请日:1992-11-16

    摘要: A fault-tolerant computer system employs a power supply system including a battery backup so that upon AC power failure the system can execute an orderly shutdown, saving state to disk. A restart procedure restores the state existing at the time of power failure if the AC power has been restored by the time the shutdown is completed. This powerfail/autorestart procedure may be implemented in a fault-tolerant multiprocessor configuration having multiple identical CPUs executing the same instruction stream, with multiple, identical memory modules in the address space of the CPUs storing duplicates of the same data. The system detects faults in the CPUs and memory modules, and places a faulty unit offline while continuing to operate using the good units. The multiple CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously; interrupts can be synchronized by ensuring that all CPUs implement the interrupt at the same point in their instruction stream. Memory references via the separate CPU-to-memory busses are voted at the three separate ports of each of the memory modules. I/O functions are implemented using two identical I/O busses, each of which is separately coupled to only one of the memory modules. A number of I/O processors are coupled to both I/O busses. I/O devices are accessed through a pair of identical (redundant) I/O processors, but only one is designated to actively control a given device; in case of failure of one I/O processor, however, an I/O device can be accessed by the other one without system shutdown.

    摘要翻译: 容错计算机系统采用包括电池备份的电源系统,使得在交流电源故障时,系统可以执行有序的关机,将状态保存到磁盘。 如果交流电源在关机完成时已恢复,则重新启动步骤将恢复停电时存在的状态。 该powerfail / autorestart过程可以在具有执行相同指令流的多个相同CPU的容错多处理器配置中实现,在存储相同数据的副本的CPU的地址空间中具有多个相同的存储器模块。 系统检测CPU和内存模块中的故障,并使故障单元脱机,并继续使用良好的单元进行操作。 多个CPU松动地同步,如通过检测诸如内存引用的事件,并阻止其他任何CPU之前的事件,直到所有同时执行该功能; 可以通过确保所有CPU在其指令流中的相同点实现中断来实现中断。 通过单独的CPU到内存总线的内存引用在每个内存模块的三个独立端口上进行投票。 使用两个相同的I / O总线实现I / O功能,每个总线单独耦合到只有一个存储器模块。 许多I / O处理器耦合到两个I / O总线。 I / O设备通过一对相同(冗余)I / O处理器访问,但只有一个被指定为主动控制给定的设备; 然而,在一个I / O处理器发生故障的情况下,I / O设备可以被另一个I / O设备访问,而不需要系统关闭。