-
公开(公告)号:US11863390B1
公开(公告)日:2024-01-02
申请号:US17888999
申请日:2022-08-16
Applicant: Nvidia Corporation
Inventor: Miriam Menes , Eitan Zahavi , Gil Bloch , Ahmad Atamli , Meni Orenbach , Mark Hummel , Glenn Dearth
IPC: G06F15/177 , H04L41/0873 , H04L45/488
CPC classification number: H04L41/0873 , H04L45/488
Abstract: Apparatuses, systems, and techniques are presented to configure computing resources to perform various tasks. In at least one embodiment, an approach presented herein can be used to verify whether a network of computing nodes is properly configured based, at least in part, on one or more expected data strings generated by the network of computing nodes.
-
公开(公告)号:US20230229524A1
公开(公告)日:2023-07-20
申请号:US17578255
申请日:2022-01-18
Applicant: NVIDIA Corporation
Inventor: Glenn Alan Dearth , Mark Hummel , Daniel Joseph Lustig
CPC classification number: G06F9/522 , G06F9/4881 , G06F9/3004
Abstract: In various examples, a single notification (e.g., a request for a memory access operation) that a processing element (PE) has reached a synchronization barrier may be propagated to multiple physical addresses (PAs) and/or devices associated with multiple processing elements. Thus, the notification may allow an indication that the processing element has reached the synchronization barrier to be recoded at multiple targets. Each notification may access the PAs of each PE and/or device of a barrier group to update a corresponding counter. The PEs and/or devices may poll or otherwise use the counter to determine when each PE of the group has reached the synchronization barrier. When a corresponding counter indicates synchronization at the synchronization barrier, a PE may proceed with performing a compute task asynchronously with one or more other PEs until a subsequent synchronization barrier may be reached.
-
公开(公告)号:US20230224239A1
公开(公告)日:2023-07-13
申请号:US17575354
申请日:2022-01-13
Applicant: NVIDIA Corporation
Inventor: Glenn Dearth , Nan Jiang , Mark Hummel , Richard Reeves
IPC: H04L45/16 , H04L12/18 , H04L45/745
CPC classification number: H04L45/16 , H04L12/18 , H04L45/745
Abstract: Apparatuses, systems, and techniques to multicast a transaction to a group of targets. In at least one embodiment, a set is selected from alternate sets of directives associated with the group of targets, and the transaction is transmitted to the group of targets in accordance with the selected set.
-
公开(公告)号:US11038800B2
公开(公告)日:2021-06-15
申请号:US16553511
申请日:2019-08-28
Applicant: Nvidia Corporation
Inventor: Glenn Dearth , Mark Hummel , Jonathan Owen , Mike Osborn , John Wortman , Rich Reeves
IPC: H04L12/801 , H04L7/00 , H04L12/743 , H04L12/947 , G06F13/16 , G06F13/30 , G06F13/42
Abstract: An endpoint in a network may make posted or non-posted write requests to another endpoint in the network. For a non-posted write request, the target endpoint provides a response to the requesting endpoint indicating that the write request has been serviced. For a posted write request, the target endpoint does not provide such an acknowledgment. Hence, posted write requests have lower overhead, but they suffer from potential synchronization and resiliency issues. While non-posted write requests do not have those issues, they cause increased load on the network because such requests require the target endpoint to acknowledge each write request. Introduced herein is a network operation technique that uses non-posted transactions while maintaining a load overhead of the network as a manageable level. The introduced technique reduces the load overhead of the non-posted write requests by collapsing and reducing a number of the responses.
-
公开(公告)号:US20190297018A1
公开(公告)日:2019-09-26
申请号:US16277349
申请日:2019-02-15
Applicant: Nvidia Corporation
Inventor: Glenn Dearth , Nan Jiang , John Wortman , Alex Ishii , Mark Hummel , Rich Reeves
IPC: H04L12/801 , H04L12/825 , H04L12/26
Abstract: Multiple processors are often used in computing systems to solve very large, complex problems, such as those encountered in artificial intelligence. Such processors typically exchange data among each other via an interconnect fabric (such as, e.g., a group of network connections and switches) in solving such complex problems. The amount of data injected into the interconnect fabric by the processors can at times overwhelm the interconnect fabric preventing some of the processors from communicating with each other. To address this problem, techniques are disclosed to enable, for example, processors that are connected to an interconnect fabric to coordinate and control the amount of data injected so that the interconnect fabric does not get overwhelmed.
-
公开(公告)号:US10097203B2
公开(公告)日:2018-10-09
申请号:US14939813
申请日:2015-11-12
Applicant: Nvidia Corporation
Inventor: Eric Tyson , Stephen D. Glaser , Mike Osborn , Mark Hummel
Abstract: A CRC generator, a method for computing a CRC of a data packet, and an electronic system, such as a circuit board, are disclosed herein. In one embodiment the method is for computing the CRC of a data packet to be transmitted on a serial communications link having multiple lanes. In one embodiment, the CRC generator includes: (1) a CRC calculator configured to define a CRC calculation of a data packet in sequential order and perform parallelized computations, according to the sequential order and the multiple lanes, to generate sub-CRC values and (2) combination circuitry configured to combine the sub-CRC values to provide the CRC value for the packet.
-
公开(公告)号:US09858221B2
公开(公告)日:2018-01-02
申请号:US15043671
申请日:2016-02-15
Applicant: Nvidia Corporation
Inventor: Mike Osborn , Mark Hummel , Jonathan Owen , Samuel Hammond Duncan
CPC classification number: G06F13/28 , G06F13/32 , G06F13/4027
Abstract: Remotely synchronizing data communicated in an electronic computing system. Ordered writing of a data set of discrete data packets (data) and a following associated semaphore packet (semaphore) from a source electronic device (source) to a bridge interface device (bridge). Relaxed writing of the data set from the bridge to discrete target memory addresses (targets) of a data-consuming electronic device (consumer), wherein the order of the data and the semaphore written to the targets is different than the order of the data and semaphore written with the ordered writing. Monitoring, by the consumer, the relaxed writing of the semaphore to one of the targets. Issuing a synchronization command to the bridge upon detection of the semaphore having been written to the one target. Sending a synchronization confirmation reply from the bridge after all of the data has been written to the targets.
-
公开(公告)号:US20170141794A1
公开(公告)日:2017-05-18
申请号:US14939813
申请日:2015-11-12
Applicant: Nvidia Corporation
Inventor: Eric Tyson , Stephen D. Glaser , Mike Osborn , Mark Hummel
CPC classification number: H03M13/091 , G06F11/10 , G06F11/1004 , G06F11/1076 , H03M13/09 , H04L1/0061 , H04L1/0066
Abstract: A CRC generator, a method for computing a CRC of a data packet, and an electronic system, such as a circuit board, are disclosed herein. In one embodiment the method is for computing the CRC of a data packet to be transmitted on a serial communications link having multiple lanes. In one embodiment, the CRC generator includes: (1) a CRC calculator configured to define a CRC calculation of a data packet in sequential order and perform parallelized computations, according to the sequential order and the multiple lanes, to generate sub-CRC values and (2) combination circuitry configured to combine the sub-CRC values to provide the CRC value for the packet.
-
公开(公告)号:US20170111144A1
公开(公告)日:2017-04-20
申请号:US14883322
申请日:2015-10-14
Applicant: Nvidia Corporation
Inventor: Dennis Ma , Michael Osborn , Eric Tyson , Stephen D. Glaser , Marvin Denman , Jonathan Owen , Mark Hummel
CPC classification number: H04L69/324 , H04L1/1809 , H04L69/323
Abstract: A receiver, transmitter and method for enabling a replay using a packetized link protocol are provided. In one embodiment, the method includes: (1) transmitting a stream of packets including an untagged packet and (2) using synchronized counters to determine a sequence ID of the untagged packet, which is a corrupt/lost packet that needs to be retransmitted.
-
-
-
-
-
-
-
-