摘要:
A mechanism is provided for collective acceleration unit tree flow control forms a logical tree (sub-network) among those processors and transfers “collective” packets on this tree. The system supports many collective trees, and each collective acceleration unit (CAU) includes resources to support a subset of the trees. Each CAU has limited buffer space, and the connection between two CAUs is not completely reliable. Therefore, to address the challenge of collective packets traversing on the tree without colliding with each other for buffer space and guaranteeing the end-to-end packet delivery, each CAU in the system effectively flow controls the packets, detects packet loss, and retransmits lost packets.
摘要:
A mechanism is provided for collective acceleration unit tree flow control forms a logical tree (sub-network) among those processors and transfers “collective” packets on this tree. The system supports many collective trees, and each collective acceleration unit (CAU) includes resources to support a subset of the trees. Each CAU has limited buffer space, and the connection between two CAUs is not completely reliable. Therefore, to address the challenge of collective packets traversing on the tree without colliding with each other for buffer space and guaranteeing the end-to-end packet delivery, each CAU in the system effectively flow controls the packets, detects packet loss, and retransmits lost packets.
摘要:
A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation.
摘要:
A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation.
摘要:
A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation.
摘要:
A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation.
摘要:
Mechanisms for providing a cluster-wide system clock in a multi-tiered full graph (MTFG) interconnect architecture are provided. Heartbeat signals transmitted by each of the processor chips in the computing cluster are synchronized. Internal system clock signals are generated in each of the processor chips based on the synchronized heartbeat signals. As a result, the internal system clock signals of each of the processor chips are synchronized since the heartbeat signals, that are the basis for the internal system clock signals, are synchronized. Mechanisms are provided for performing such synchronization using direct couplings of processor chips within the same processor book, different processor books in the same supernode, and different processor books in different supernodes of the MTFG interconnect architecture.
摘要:
A system and method for performing dynamic request routing based on broadcast source request information are provided. Each processor chip in the system may use a synchronized heartbeat signal it generates to provide source request information to each of the other processor chips in the system. The source request information identifies the number of active source requests sent by the processor chip that originated the heartbeat signal. The source request information from each of the processor chips in the system may be used by the processor chips in determining optimal routing paths for data from a source processor chip to a destination processor chip. As a result, the congestion of data for processing at each of the processor chips along each possible routing path may be taken into account when selecting to which processor chip to forward data.
摘要:
A system for providing a cluster-wide system clock in a multi-tiered full graph (MTFG) interconnect architecture are provided. Heartbeat signals transmitted by each of the processor chips in the computing cluster are synchronized. Internal system clock signals are generated in each of the processor chips based on the synchronized heartbeat signals. As a result, the internal system clock signals of each of the processor chips are synchronized since the heartbeat signals, that are the basis for the internal system clock signals, are synchronized. Mechanisms are provided for performing such synchronization using direct couplings of processor chips within the same processor book, different processor books in the same supernode, and different processor books in different supernodes of the MTFG interconnect architecture.
摘要:
A method for providing a cluster-wide system clock in a multi-tiered full graph (MTFG) interconnect architecture are provided. Heartbeat signals transmitted by each of the processor chips in the computing cluster are synchronized. Internal system clock signals are generated in each of the processor chips based on the synchronized heartbeat signals. As a result, the internal system clock signals of each of the processor chips are synchronized since the heartbeat signals, that are the basis for the internal system clock signals, are synchronized. Mechanisms are provided for performing such synchronization using direct couplings of processor chips within the same processor book, different processor books in the same supernode, and different processor books in different supernodes of the MTFG interconnect architecture.