摘要:
A method for expressing the algorithms for the manipulation of hardware includes providing program instructions that describe a sequence of one or more transactions for manipulating hardware components of a system. The program instructions may call one or more code segments that include specific information associated with particular hardware components of the system. In addition, the program instructions are independent of the specific information. The method may also include translating the program instructions into an executable form and executing the executable form of the program instructions to manipulate the hardware components of the system from one consistent state to a next consistent state.
摘要:
A multiprocessor system is disclosed that employs an apparatus and method for caging a redundant component to allow testing of the redundant component without interfering with normal system operation. In one embodiment the multiprocessor system includes at least two system controllers and a set of processing nodes interconnected by a network. The system controllers allocate and configure system resources, and the processing nodes each include a node interface that couple the nodes to the system controllers. The node interfaces can be individually and separately configured in a caged mode and an uncaged mode. In the uncaged mode, the node interface communicates information from either of the system controllers to other components in the processing node. In the caged mode, the node interface censors information from at least one of the system controllers.
摘要:
A method for use in a computer system provides a dynamic, “self tuning” soft-error-rate-discrimination (SERD) method and apparatus. Specially designed SRAMs or other circuits are “tuned” in a manner that gives them extreme susceptibility to cosmic neutron events (soft errors), higher than that of the “regular” SRAM components, memory modules or other components in the computer system. One such specially designed SRAM is deployed per server. An interface algorithm continuously sends read/write traffic to the special SRAM to infer the soft error rate (SER), which is directly proportional to cosmic neutron flux. The inferred cosmic neutron flux rate is employed in a Poisson SPRT algorithmic approach that dynamically compensates the soft error discrimination sensitivity in accordance with the instantaneous neutron flux for all of the regular SRAM components in the server.
摘要:
A multiprocessor system is disclosed that employs an apparatus and method for caging a redundant component to allow testing of the redundant component without interfering with normal system operation. In one embodiment the multiprocessor system includes at least two system controllers and a set of processing nodes interconnected by a network. The system controllers allocate and configure system resources, and the processing nodes each include a node interface that couple the nodes to the system controllers. The node interfaces can be individually and separately configured in a caged mode and an uncaged mode. In the uncaged mode, the node interface communicates information from either of the system controllers to other components in the processing node. In the caged mode, the node interface censors information from at least one of the system controllers. When all node interfaces censor information from a common system controller, this system controller is effectively “caged” and communications from this system controller are thereby prevented from reaching other node components. This allows the caged system controller along with all its associated interconnections to be tested without interfering with normal operation of the system. Normal system configuration tasks are handled by the uncaged system controller. The uncaged system controller can instruct the node interfaces to uncage the caged system controller if the tests are successfully completed.
摘要:
A multiprocessing computer system provides the hardware support to properly test an I/O board while the system is running user application programs and while preventing a faulty board from causing a system crash. The system includes a centerplane that mounts multiple expander boards. Each expander board in turn connects a microprocessor board and an I/O board to the centerplane. Prior to testing, the replacement I/O board becomes a part of a dynamic system domain software partition after it has been inserted into an expander board of the multiprocessing computer system. Testing an I/O board involves executing a process using a microprocessor and memory on a microprocessor board to perform hardware tests on the I/O board. An error cage, address transaction cage, and interrupt transaction cage isolate any errors generated while the I/O board is being tested. The error cage isolates correction code errors, parity errors, protocol errors, timeout errors, and other similar errors generated by the I/O board under test. The address transaction cage isolates out of range memory addresses from the I/O board under test. The interrupt transaction cage isolates interrupt requests to an incorrect target port generated by the I/O board under test. The errors generated by the I/O board are logged in a status register and suppressed.