摘要:
A fault-tolerant computer system employs a pseudo-filesystem to dynamically manage the hardware components. A directory which appears as a standard, hierarchical directory in this filesystem contains a file for each component; each file maps to either a hardware component or a software module. The pseudo-filesystem hierarchy is determined during system initialization and is automatically updated whenever the software or hardware configuration changes. The pseudo-filesystem, called /config filesystem herein, is implemented as a Unix filesystem in the Unix filesystem switch. This pseudo-filesystem method may be implemented in a fault-tolerant, redundant computer system configuration having multiple identical CPUs executing the same instruction stream, with multiple, identical memory modules in the address space of the CPUs storing duplicates of the same data. The system detects faults in the CPUs and memory modules, and places a faulty unit offline while continuing to operate using the good units. The multiple CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously; interrupts can be synchronized by ensuring that all CPUs implement the interrupt at the same point in their instruction stream. Memory references are voted at the three separate ports of each of the memory modules.
摘要:
A computer system in a fault-tolerant configuration employs multiple identical CPUs executing the same instruction stream, with multiple, identical memory modules in the address space of the CPUs storing duplicates of the same data. The system detects faults in the CPUs and memory modules, and places a faulty unit offline while continuing to operate using the good units. The faulty unit can be replaced and reintegrated into the system without shutdown. The multiple CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously; interrupts can be synchronized by ensuring that all CPUs implement the interrupt at the same point in their instruction stream. Memory references via the separate CPU-to-memory busses are voted at the three separate ports of each of the memory modules. I/O functions are implemented using two identical I/O busses, each of which is separately coupled to only one of the memory modules. A number of I/O processors are coupled to both I/O busses. I/O devices are accessed through a pair of identical (redundant) I/O processors, but only one is designated to actively control a given device; in case of failure of one I/O processor, however, an I/O device can be accessed by the other one without system shutdown.
摘要:
A computer system in a fault-tolerant configuration employees multiple identical CPUs executing the same instruction stream, with multiple, identical memory modules in the address space of the CPUs storing duplicates of the same data. The system detects faults in the CPUs and memory modules, and places a faulty unit offline while continuing to operate using the good units. The faulty unit can be replaced and reintegrated into the system without shutdown. The multiple CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously; interrupts can be synchronized by ensuring that all CPUs implement the interrupt at the same point in their instruction stream. Memory references via the separate CPU-to-memory busses are voted at the three separate ports of each of the memory modules. I/O functions are implemented using two identical I/O busses, each of which is separately coupled to only one of the memory modules. A number of I/O processors are coupled to both I/O busses. I/O devices are accessed through a pair of identical (redundant) I/O processors, but only one is designated to actively control a given device; in case of failure of one I/O processor, however, an I/O device can be accessed by the other one without system shutdown.
摘要:
A computer system in a fault-tolerant configuration employs multiple identical CPUs executing the same instruction stream, with multiple, identical memory modules in the address space of the CPUs storing duplicates of the same data. The system detects faults in the CPUs and memory modules, and places a faulty unit offline while continuing to operate using the good units. The faulty unit can be replaced and reintegrated into the system without shutdown. The multiple CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously; interrupts can be synchronized by ensuring that all CPUs implement the interrupt at the same point in their instruction stream. Memory references via the separate CPU-to-memory busses are voted at the three separate ports of each of the memory modules. I/O functions are implemented using two identical I/O busses, each of which is separately coupled to only one of the memory modules. A number of I/O processors are coupled to both I/O busses. I/O devices are accessed through a pair of identical (redundant) processors, but only one is designated to actively control a given device; in case of failure of one I/O processor, however, an I/O device can be accessed by the other one without system shutdown.
摘要:
A network-based storage system comprises one or more block-level storage servers that connect to, and provide disk storage for, one or more host computers. In one embodiment, the system is capable of subdividing the storage space of an array of disk drives into multiple storage partitions, and allocating the partitions to host computers on a network. A storage partition allocated to a particular host computer may appear as local disk drive storage to user-level processes running on the host computer.
摘要:
A network-based storage system comprises one or more block-level storage servers that connect to, and provide storage for, one or more host computers over logical network connections, such as TCP/IP connections. In one embodiment, the block-level storage servers implement a protocol through which a storage server authenticates a host before permitting the host to access storage resources. Upon successful authentication, the storage server may also provide access information to the host.
摘要:
A computer system employs multiple CPUs, all executing the same instruction stream, with multiple, identical memory modules storing duplicates of the same data and accessable by all the CPUs, providing global memory. The multiple CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously. Each CPU has its own fast cache and also a local memory not accessable by the other CPUs. A hierarchical virtual memory management arrangement for this system employs demand paging to keep the most-used data in the local memory, page-swapping with the global memory. Page swapping with disk memory is through the global memory; the global memory is used as a disk buffer and also to hold pages likely to be needed for loading to local memory. The operating system kernal is kept in local memory. This arrangement is particularly useful in fault-tolerant computer systems.
摘要:
A computer system in a fault-tolerant configuration employs multiple identical CPUs executing the same instruction stream, with multiple, identical memory modules in the address space of the CPUs storing duplicates of the same data. Memory references. The multiple CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously; interrupts can be synchronized by ensuring that all CPUs implement the interrupt at the same point in their instruction stream. Memory references by the multiple CPUs are voted by each of the memory modules. A private-write area is included in the shared memory space in the memory modules to allow functions such as software voting of state information unique to CPUs. All CPUs write state information to their private-write area, then all CPUs read all the private-write areas for functions such as detecting differences in interrupt cause or the like.
摘要:
A network-based storage system comprises one or more block-level storage servers that connect to, and provide disk storage for, one or more host computers. In one embodiment, the system is capable of subdividing the storage space of an array of disk drives into multiple storage partitions, and allocating the partitions to host computers on a network. A storage partition allocated to a particular host computer may appear as local disk drive storage to user-level processes running on the host computer.
摘要:
A network-based storage system comprises one or more block-level storage servers that connect to, and provide disk storage for, one or more host computers (“hosts”) over logical network connections (preferably TCP/IP sockets). In one embodiment, each host can maintain one or more socket connections to each storage server, over which multiple I/O operations may be performed concurrently in a non-blocking manner. The physical storage of a storage server may optionally be divided into multiple partitions, each of which may be independently assigned to a particular host or to a group of hosts. When a host initially connects to a storage server in one embodiment, the storage server initially authenticates the host, and then notifies the host of the ports that may be used to establish data connections and of the partitions assigned to that host.