Fault-tolerant computer system with online recovery and reintegration of redundant components
    1.
    发明授权
    Fault-tolerant computer system with online recovery and reintegration of redundant components 失效
    具有在线恢复和重新集成冗余组件的容错计算机系统

    公开(公告)号:US06263452B1

    公开(公告)日:2001-07-17

    申请号:US09226960

    申请日:1999-01-08

    IPC分类号: G06F1118

    摘要: A computer system in a fault-tolerant configuration employees multiple identical CPUs executing the same instruction stream, with multiple, identical memory modules in the address space of the CPUs storing duplicates of the same data. The system detects faults in the CPUs and memory modules, and places a faulty unit offline while continuing to operate using the good units. The faulty unit can be replaced and reintegrated into the system without shutdown. The multiple CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously; interrupts can be synchronized by ensuring that all CPUs implement the interrupt at the same point in their instruction stream. Memory references via the separate CPU-to-memory busses are voted at the three separate ports of each of the memory modules. I/O functions are implemented using two identical I/O busses, each of which is separately coupled to only one of the memory modules. A number of I/O processors are coupled to both I/O busses. I/O devices are accessed through a pair of identical (redundant) I/O processors, but only one is designated to actively control a given device; in case of failure of one I/O processor, however, an I/O device can be accessed by the other one without system shutdown.

    摘要翻译: 在容错配置中的计算机系统雇用执行相同指令流的多个相同的CPU,在存储相同数据的副本的CPU的地址空间中具有多个相同的存储器模块。 系统检测CPU和内存模块中的故障,并使故障单元脱机,并继续使用良好的单元进行操作。 故障单元可以更换并重新集成到系统中,无需关闭。 多个CPU松动地同步,如通过检测诸如内存引用的事件,并阻止其他任何CPU之前的事件,直到所有同时执行该功能; 可以通过确保所有CPU在其指令流中的相同点实现中断来实现中断。 通过单独的CPU到内存总线的内存引用在每个内存模块的三个独立端口上进行投票。 使用两个相同的I / O总线实现I / O功能,每个总线单独耦合到只有一个存储器模块。 许多I / O处理器耦合到两个I / O总线。 I / O设备通过一对相同(冗余)I / O处理器访问,但只有一个被指定为主动控制给定的设备; 然而,在一个I / O处理器发生故障的情况下,I / O设备可以被另一个I / O设备访问,而不需要系统关闭。

    Refresh control for dynamic memory in multiple processor system
    4.
    发明授权
    Refresh control for dynamic memory in multiple processor system 失效
    刷新多处理器系统中动态内存的控制

    公开(公告)号:US5146589A

    公开(公告)日:1992-09-08

    申请号:US629698

    申请日:1990-12-17

    摘要: A computer system in a fault-tolerant configuration employs three identical CPUs executing the same instruction stream, with two identical, self-checking memory modules storing duplicates of the same data. Memory references by the three CPUs are made by three separate busses connected to three separate ports of each of the two memory modules. The three CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously; interrupts can be synchronized by ensuring that all three CPUs implement the interrupt at the same point in their instruction stream. Memory references via the separate CPU-to-memory busses are voted at the three separate ports of each of the memory modules. Each CPU has a local memory, separate from the memory modules, and this local memory is of the dynamic type so it must be periodically refreshed. The refresh cycles are interposed at the same point in the instruction stream for each of the three CPUs by counting instruction execution cycles separately in each CPU, and interrupting to do a refresh cycle when a given count is reached. Stall cycles are also counted, and when long periods of stalls occur then more than one refresh cycle is interposed to catch up to the needed refresh schedule.

    摘要翻译: 容错配置的计算机系统使用三个相同的执行相同指令流的CPU,其中两个相同的自检存储器模块存储相同数据的重复。 三个CPU的存储器引用由连接到两个存储器模块中的每一个的三个单独端口的三个单独的总线进行。 三个CPU松动地同步,如通过检测诸如内存引用的事件,并阻止其他CPU之前的事件,直到所有执行功能同时执行; 可以通过确保所有三个CPU在其指令流中的相同点实现中断来实现中断。 通过单独的CPU到内存总线的内存引用在每个内存模块的三个独立端口上进行投票。 每个CPU都有一个与内存模块分开的本地内存,而这个本地内存是动态类型的,所以它必须定期刷新。 通过在每个CPU中单独计算指令执行周期,在三个CPU中的每一个的指令流中的同一点插入刷新周期,并且当达到给定计数时中断执行刷新周期。 也会计算停顿周期,当发生长时间的停顿时,会插入多个刷新周期以赶上所需的刷新时间表。