Fault-tolerant computer system with online recovery and reintegration of redundant components
    2.
    发明授权
    Fault-tolerant computer system with online recovery and reintegration of redundant components 失效
    具有在线恢复和重新集成冗余组件的容错计算机系统

    公开(公告)号:US06263452B1

    公开(公告)日:2001-07-17

    申请号:US09226960

    申请日:1999-01-08

    IPC分类号: G06F1118

    摘要: A computer system in a fault-tolerant configuration employees multiple identical CPUs executing the same instruction stream, with multiple, identical memory modules in the address space of the CPUs storing duplicates of the same data. The system detects faults in the CPUs and memory modules, and places a faulty unit offline while continuing to operate using the good units. The faulty unit can be replaced and reintegrated into the system without shutdown. The multiple CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously; interrupts can be synchronized by ensuring that all CPUs implement the interrupt at the same point in their instruction stream. Memory references via the separate CPU-to-memory busses are voted at the three separate ports of each of the memory modules. I/O functions are implemented using two identical I/O busses, each of which is separately coupled to only one of the memory modules. A number of I/O processors are coupled to both I/O busses. I/O devices are accessed through a pair of identical (redundant) I/O processors, but only one is designated to actively control a given device; in case of failure of one I/O processor, however, an I/O device can be accessed by the other one without system shutdown.

    摘要翻译: 在容错配置中的计算机系统雇用执行相同指令流的多个相同的CPU,在存储相同数据的副本的CPU的地址空间中具有多个相同的存储器模块。 系统检测CPU和内存模块中的故障,并使故障单元脱机,并继续使用良好的单元进行操作。 故障单元可以更换并重新集成到系统中,无需关闭。 多个CPU松动地同步,如通过检测诸如内存引用的事件,并阻止其他任何CPU之前的事件,直到所有同时执行该功能; 可以通过确保所有CPU在其指令流中的相同点实现中断来实现中断。 通过单独的CPU到内存总线的内存引用在每个内存模块的三个独立端口上进行投票。 使用两个相同的I / O总线实现I / O功能,每个总线单独耦合到只有一个存储器模块。 许多I / O处理器耦合到两个I / O总线。 I / O设备通过一对相同(冗余)I / O处理器访问,但只有一个被指定为主动控制给定的设备; 然而,在一个I / O处理器发生故障的情况下,I / O设备可以被另一个I / O设备访问,而不需要系统关闭。

    Fault-tolerant computer system with auto-restart after power-fall
    9.
    发明授权
    Fault-tolerant computer system with auto-restart after power-fall 失效
    掉电后自动重启的容错计算机系统

    公开(公告)号:US5317752A

    公开(公告)日:1994-05-31

    申请号:US977734

    申请日:1992-11-16

    摘要: A fault-tolerant computer system employs a power supply system including a battery backup so that upon AC power failure the system can execute an orderly shutdown, saving state to disk. A restart procedure restores the state existing at the time of power failure if the AC power has been restored by the time the shutdown is completed. This powerfail/autorestart procedure may be implemented in a fault-tolerant multiprocessor configuration having multiple identical CPUs executing the same instruction stream, with multiple, identical memory modules in the address space of the CPUs storing duplicates of the same data. The system detects faults in the CPUs and memory modules, and places a faulty unit offline while continuing to operate using the good units. The multiple CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously; interrupts can be synchronized by ensuring that all CPUs implement the interrupt at the same point in their instruction stream. Memory references via the separate CPU-to-memory busses are voted at the three separate ports of each of the memory modules. I/O functions are implemented using two identical I/O busses, each of which is separately coupled to only one of the memory modules. A number of I/O processors are coupled to both I/O busses. I/O devices are accessed through a pair of identical (redundant) I/O processors, but only one is designated to actively control a given device; in case of failure of one I/O processor, however, an I/O device can be accessed by the other one without system shutdown.

    摘要翻译: 容错计算机系统采用包括电池备份的电源系统,使得在交流电源故障时,系统可以执行有序的关机,将状态保存到磁盘。 如果交流电源在关机完成时已恢复,则重新启动步骤将恢复停电时存在的状态。 该powerfail / autorestart过程可以在具有执行相同指令流的多个相同CPU的容错多处理器配置中实现,在存储相同数据的副本的CPU的地址空间中具有多个相同的存储器模块。 系统检测CPU和内存模块中的故障,并使故障单元脱机,并继续使用良好的单元进行操作。 多个CPU松动地同步,如通过检测诸如内存引用的事件,并阻止其他任何CPU之前的事件,直到所有同时执行该功能; 可以通过确保所有CPU在其指令流中的相同点实现中断来实现中断。 通过单独的CPU到内存总线的内存引用在每个内存模块的三个独立端口上进行投票。 使用两个相同的I / O总线实现I / O功能,每个总线单独耦合到只有一个存储器模块。 许多I / O处理器耦合到两个I / O总线。 I / O设备通过一对相同(冗余)I / O处理器访问,但只有一个被指定为主动控制给定的设备; 然而,在一个I / O处理器发生故障的情况下,I / O设备可以被另一个I / O设备访问,而不需要系统关闭。

    Fault-tolerant computer system with /CONFIG filesystem
    10.
    发明授权
    Fault-tolerant computer system with /CONFIG filesystem 失效
    具有/ CONFIG文件系统的容错计算机系统

    公开(公告)号:US5327553A

    公开(公告)日:1994-07-05

    申请号:US973202

    申请日:1992-11-06

    摘要: A fault-tolerant computer system employs a pseudo-filesystem to dynamically manage the hardware components. A directory which appears as a standard, hierarchical directory in this filesystem contains a file for each component; each file maps to either a hardware component or a software module. The pseudo-filesystem hierarchy is determined during system initialization and is automatically updated whenever the software or hardware configuration changes. The pseudo-filesystem, called /config filesystem herein, is implemented as a Unix filesystem in the Unix filesystem switch. This pseudo-filesystem method may be implemented in a fault-tolerant, redundant computer system configuration having multiple identical CPUs executing the same instruction stream, with multiple, identical memory modules in the address space of the CPUs storing duplicates of the same data. The system detects faults in the CPUs and memory modules, and places a faulty unit offline while continuing to operate using the good units. The multiple CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously; interrupts can be synchronized by ensuring that all CPUs implement the interrupt at the same point in their instruction stream. Memory references are voted at the three separate ports of each of the memory modules.

    摘要翻译: 容错计算机系统采用伪文件系统来动态管理硬件组件。 在此文件系统中显示为标准的分层目录的目录包含每个组件的文件; 每个文件映射到硬件组件或软件模块。 伪文件系统层次结构在系统初始化期间确定,并且每当软件或硬件配置更改时自动更新。 这里称为/ config文件系统的伪文件系统在Unix文件系统交换机中被实现为Unix文件系统。 该伪文件系统方法可以在具有执行相同指令流的多个相同CPU的容错冗余计算机系统配置中实现,在存储相同数据的副本的CPU的地址空间中具有多个相同的存储器模块。 系统检测CPU和内存模块中的故障,并使故障单元脱机,并继续使用良好的单元进行操作。 多个CPU松动地同步,如通过检测诸如内存引用的事件,并阻止其他任何CPU之前的事件,直到所有同时执行该功能; 可以通过确保所有CPU在其指令流中的相同点实现中断来实现中断。 内存引用在每个内存模块的三个独立端口上进行表决。