Abstract:
A computer subsystem and a computer system, where the computer subsystem includes L composite nodes (CNs), each CN includes M basic nodes, each basic node includes N central processing units (CPUs) and one node controller (NC). Any two CPUs in each basic node are interconnected. Each CPU in each basic node is connected to the NC in the basic node. The NC in each basic node has a routing function. Any two NCs in the M basic nodes are interconnected. A connection between the L CNs formed through connections between NCs enables communication between any two NCs to be no more than three hops. Hence, the computer subsystem and the computer system can reduce the kinds and the number of interconnection chips, and simplify an interconnection structure of a system, thereby improving reliability of the system.
Abstract:
A network distance prediction method and apparatus, wherein the method includes: communicating, with at least two reference nodes, to determine values of at least some elements in a local distance matrix; constructing, the local distance matrix based on the values of the at least some elements in the local distance matrix; performing, low-rank sparse factorization on the local distance matrix to obtain a low-rank matrix; obtaining, values of elements in a first element set of the low-rank matrix, to use the values as target values of network distances between the to-be-positioned node and the at least two reference nodes; communicating, with the reference nodes, to obtain coordinates of the reference nodes in a network coordinate system; and determining, coordinates of the to-be-positioned node. The embodiments of the present invention can improve accuracy of network distance prediction.
Abstract:
A system for implementing interconnection fault tolerance between CPUs, a first CPU and a second CPU implements interconnection through a first CPU interconnect device and a second CPU interconnect device. The system adds a data channel between a first SerDes interface of the first CPU interconnect device and a second SerDes interface of the second CPU interconnect device, and transmits link connection state information and a link control signal through the added data channel. The system monitors a link state of any one link in a CPU interconnection system, transmits the link state through the added data channel, recovers any one of the connection links when determining whether any one of the first connection link, the second connection link and the third connection link is faulty.
Abstract:
A system for implementing interconnection fault tolerance between CPUs, a first CPU and a second CPU implements interconnection through a first CPU interconnect device and a second CPU interconnect device. The system adds a data channel between a first SerDes interface of the first CPU interconnect device and a second SerDes interface of the second CPU interconnect device, and transmits link connection state information and a link control signal through the added data channel. The system monitors a link state of any one link in a CPU interconnection system, transmits the link state through the added data channel, recovers any one of the connection links when determining whether any one of the first connection link, the second connection link and the third connection link is faulty.