摘要:
A number of intelligent nodes (bus interface units-BIUs and memory control units-MCUs) are provided in a matrix composed of processor buses (105) with corresponding error-reporting and control lines (106); and memory buses (107) with corresponding error-reporting and control lines (108). Error-detection mechanisms deal with information flow occuring across area boundaries. Each node (100, 101, 102, 103) has means for logging errors and reporting errors on the error report lines (106, 108). If an error recurs the node at which the error exists initiates an error message which is received and repropagated on the error report lines by all nodes. The error message identifies the type of error and the node ID at which the error was detected. Confinement area isolation logic in a node isolates a faulty confinement area of which the node is a part, upon the condition that the node ID in an error report message identifies the node as a node which is a part of a faulty confinement area. Logic in the node reconfigures at least part of the system upon the condition that the node ID in the error report message identifies the node as a node which is part of a confinement area which should be recofigured to recover from the error reported in the error report message.
摘要:
A number of intelligent nodes (bus-interface units-BIUs and memory-control units-MCUs) are provided in a matrix composed of processor buses (105) with corresponding error-reporting and control lines (106); and memory buses (107) with corresponding error-reporting and control lines (108). Each node (100, 101, 102, 103) has means for logging errors and reporting errors on the error-report lines (106, 108). Processor modules (110) and memory modules (112) are each connected to a node which controls access to a common memory bus (107). Each node includes means (a married bit-170 and a shadow bit-172) for marrying modules in pairs such that each module in the pair tracks the operations directed to the module pair, and each module in the pair alternates with the other module in the handling of requests or replies. Each node registers the ID of the other node in a spouse ID register. Comparison logic (162, 164) in each node resets the married bit upon the condition that the node ID (identifying the node at which the error occurred) in an error-report message is equal to the ID stored in the spouse ID register, thus identifying the spouse node (the partner of the node in which the comparison logic is located) as the source of the error. Resetting the married bit splits apart the primary/shadow pair, so that the error-free module takes over and ceases to alternate with its partner.
摘要:
A number of intelligent crossbar switches (100) are provided in a matrix of orthogonal lines interconnecting processor (110) and memory control unit (MCU) modules (112). The matrix is composed of processor buses (105) and corresponding error-reporting lines (106); and memory buses (107) with corresponding error-reporting lines (108). At the intersection of these lines is a crossbar switch node (100). The crossbar switches function to pass memory requests from a processor to a memory module attached to an MCU node and to pass any data associated with the requests. The system is organized into confinement areas at the boundaries of which are positioned error-detection mechanisms to deal with information flow occurring across area boundaries. Each crossbar switch and MCU node has means for the logging and signaling of errors to other nodes. Means are provided to reconfigure the system to reroute traffic around the confinement area at fault and for restarting system operation in a possibly degraded mode.