Abstract:
Method, apparatus, and systems for Link Transfer, bit error detection and link retry using flit bundles asynchronous to link Fabric Packets. A first type of packet comprising a Fabric Packet is generated and its data content is divided into multiple data units called “flits.” The flits are then bundled into a second type of packet comprising Link Transfer Packets (LTPs). The LTPs are then sent over single link segments in a fabric comprising many point-to-point links. Each LTP includes a CRC that is used to ensure that data transmitted over each link segment is error free, and comprises a unit of retransmission. The size of the fabric packets may vary, and they may be larger or smaller than an LTP. The transfer scheme enabled flits from multiple fabric packets to be bundled into a single LTP. Upon receipt at a fabric endpoint, the flits from the LTPs are extracted and reassembled to regenerate the Fabric Packets.
Abstract:
A system-on-a-chip, such as a logical PHY, may be divided into hard IP blocks with fixed routing, and soft IP blocks with flexible routing. Each hard IP block may provide a fixed number of lanes. Using p hard IP blocks, where each block provides n data lanes, h=n*p total hard IP data lanes are provided. Where the system design calls for k total data lanes, it is possible that k≠h, so that ┌k/n┐ hard IP blocks provide h=n*p available hard IP data lanes. In that case, h−k lanes may be disabled. In cases where lane reversals occur, such as between hard IP and soft IP, bowtie routing may be avoided by the use of a multiplexer-like programmable switch within the soft IP.
Abstract:
Method, apparatus, and systems for detecting lane errors and removing errant lanes in multi-lane links. Data comprising link packets is split into a plurality of bitstreams and transmitted over respective lanes of a multi-lane link in parallel. The bitstream data is received at multiple receive lanes of a receiver port and processed to reassemble link packets and to calculate a CRC over the data received on each lane. The link packets include a transmitted CRC that is compared to a received CRC to detect link packet errors. Upon detection of a link packet error, per-lane or per transfer group CRC values are stored, and a retry request is issued to retransmit the bad packet. In conjunction with receipt of the retransmitted packet, per-lane or per transfer group CRC values are recalculated over the received data and compared with the stored per-lane or per transfer group CRC values to detect the lane causing the link packet error.
Abstract:
Method, apparatus, and systems for reliably transferring Ethernet packet data over a link layer and facilitating fabric-to-Ethernet and Ethernet-to-fabric gateway operations at matching wire speed and packet data rate. Ethernet header and payload data is extracted from Ethernet frames received at the gateway and encapsulated in fabric packets to be forwarded to a fabric endpoint hosting an entity to which the Ethernet packet is addressed. The fabric packets are divided into flits, which are bundled in groups to form link packets that are transferred over the fabric at the Link layer using a reliable transmission scheme employing implicit ACKnowledgements. At the endpoint, the fabric packet is regenerated, and the Ethernet packet data is de-encapsulated. The Ethernet frames received from and transmitted to an Ethernet network are encoded using 64b/66b encoding, having an overhead-to-data bit ratio of 1:32. Meanwhile, the link packets have the same ratio, including one overhead bit per flit and a 14-bit CRC plus a 2-bit credit return field or sideband used for credit-based flow control.
Abstract:
Methods and apparatus for implementing time synchronization across exascale fabrics. A master clock node is coupled to a plurality of slave nodes via a fabric comprising a plurality of fabric switches and a plurality of fabric links, wherein each slave node is connected to the master clock node via a respective clock tree path that traverses at least one fabric switch. The fabric switches are configured to selectively forward master clock time data internally along paths with fixed latencies that bypass the switches' buffers and switch circuitry, which enables the entire clock tree paths to also have fixed latencies. The fixed latency of the clock tree path is determined for each slave node. The local clocks of the slave nodes are then synchronized with the master clock by using master clock time data received by each slave node and the fixed latency of the clock tree path from the master clock node to the slave node that is determined. Techniques for determining a clock rate mismatch between the master clock and a local clock is also provided.
Abstract:
Methods, apparatus, and systems for implementing hierarchical and lossless packet preemption and interleaving to reduce latency jitter in flow-controller packet-based networks. Fabric packets are divided into a plurality of data units, with data units for different fabric packets buffered in separate buffers. Data units are pulled from the buffers and added to a transmit stream in which groups of data units are interleaved. Upon receipt by a receiver, the groups of data units are separated out and buffered in separate buffers under which data units for the same fabric packets are grouped together. In one aspect, each buffer is associated with a respective virtual lane (VL), and the fabric packets are effectively transferred over fabric links using virtual lanes. VLs may have different levels of priority under which data units for fabric packets in higher-priority VLs may preempt fabric packets in lower-priority VLs. By transferring data units rather than entire packets, transmission of a packet can be temporarily paused in favor of a higher-priority packet. Multiple levels of preemption and interleaving in a nested manner are supported.
Abstract:
Methods, apparatus, and systems for implementing a link layer retry protocol utilizing implicit ACKnowledgements (ACKs). Peer link interfaces are configured to facilitate confirmed error-free delivery of link-layer packets through use of implicit ACKs, while also providing retransmission of packets for which errors are detected and guaranteeing the link control data is either successfully received or data transfer over the link is prevented. In conjunction with transmitting packets, reliable packets are copied into sequential slots in a replay buffer. Each link interface tracks the slot at which each reliable packet is buffered, and in response to detection of an error, a retry request is sent to the transmit-side to retransmit the errant packet. The previously buffered copy of the errant packet is retrieved from the replay buffer and retransmitted. Through use of a link roundtrip detection mechanism, absence of a retry request by the time a replay buffer has returned to the slot of a reliable packet (plus a predetermined number of additional transfer cycles, if applicable) provides an implicit ACK that the packet was received without error.
Abstract:
Method, apparatus, and systems for reliably transferring Ethernet packet data over a link layer and facilitating fabric-to-Ethernet and Ethernet-to-fabric gateway operations at matching wire speed and packet data rate. Ethernet header and payload data is extracted from Ethernet frames received at the gateway and encapsulated in fabric packets to be forwarded to a fabric endpoint hosting an entity to which the Ethernet packet is addressed. The fabric packets are divided into flits, which are bundled in groups to form link packets that are transferred over the fabric at the Link layer using a reliable transmission scheme employing implicit ACKnowledgements. At the endpoint, the fabric packet is regenerated, and the Ethernet packet data is de-encapsulated. The Ethernet frames received from and transmitted to an Ethernet network are encoded using 64b/66b encoding, having an overhead-to-data bit ratio of 1:32. Meanwhile, the link packets have the same ratio, including one overhead bit per flit and a 14-bit CRC plus a 2-bit credit return field or sideband used for credit-based flow control.
Abstract:
Method, apparatus, and systems for reliably transferring Ethernet packet data over a link layer and facilitating fabric-to-Ethernet and Ethernet-to-fabric gateway operations at matching wire speed and packet data rate. Ethernet header and payload data is extracted from Ethernet frames received at the gateway and encapsulated in fabric packets to be forwarded to a fabric endpoint hosting an entity to which the Ethernet packet is addressed. The fabric packets are divided into flits, which are bundled in groups to form link packets that are transferred over the fabric at the Link layer using a reliable transmission scheme employing implicit ACKnowledgements. At the endpoint, the fabric packet is regenerated, and the Ethernet packet data is de-encapsulated. The Ethernet frames received from and transmitted to an Ethernet network are encoded using 64b/66b encoding, having an overhead-to-data bit ratio of 1:32. Meanwhile, the link packets have the same ratio, including one overhead bit per flit and a 14-bit CRC plus a 2-bit credit return field or sideband used for credit-based flow control.
Abstract:
A system-on-a-chip, such as a logical PHY, may be divided into hard IP blocks with fixed routing, and soft IP blocks with flexible routing. Each hard IP block may provide a fixed number of lanes. Using p hard IP blocks, where each block provides n data lanes, h=n*p total hard IP data lanes are provided. Where the system design calls for k total data lanes, it is possible that k≠h, so that [k/n] hard IP blocks provide h=n*p available hard IP data lanes. In that case, h−k lanes may be disabled. In cases where lane reversals occur, such as between hard IP and soft IP, bowtie routing may be avoided by the use of a multiplexer-like programmable switch within the soft IP.
Abstract translation:诸如逻辑PHY的片上系统可以被划分为具有固定路由的硬IP块和具有灵活路由的软IP块。 每个硬IP块可以提供固定数量的车道。 使用p硬IP块,其中每个块提供n个数据通道,h = n * p提供总硬IP数据通道。 在系统设计要求k个总数据通道的情况下,k≠h可以使得[k / n]硬IP块提供h = n * p可用的硬IP数据通道。 在这种情况下,h-k通道可能被禁用。 在发生通道反转的情况下,例如在硬IP和软IP之间,可以通过使用软IP内的多路复用器可编程开关来避免路由路由。