-
公开(公告)号:US11171798B2
公开(公告)日:2021-11-09
申请号:US16938097
申请日:2020-07-24
Applicant: NVIDIA Corporation
Inventor: Benjamin Klenk , Nan Jiang , Larry Robert Dennison , Gregory M. Thorson
IPC: H04L29/08 , H04L12/18 , G06F9/50 , H04L12/801 , H04L12/813 , H04L12/927 , H04L12/741
Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.
-
公开(公告)号:US11082347B2
公开(公告)日:2021-08-03
申请号:US16277349
申请日:2019-02-15
Applicant: Nvidia Corporation
Inventor: Glenn Dearth , Nan Jiang , John Wortman , Alex Ishii , Mark Hummel , Rich Reeves
IPC: H04L12/801 , H04L12/26 , H04L12/825
Abstract: Multiple processors are often used in computing systems to solve very large, complex problems, such as those encountered in artificial intelligence. Such processors typically exchange data among each other via an interconnect fabric (such as, e.g., a group of network connections and switches) in solving such complex problems. The amount of data injected into the interconnect fabric by the processors can at times overwhelm the interconnect fabric preventing some of the processors from communicating with each other. To address this problem, techniques are disclosed to enable, for example, processors that are connected to an interconnect fabric to coordinate and control the amount of data injected so that the interconnect fabric does not get overwhelmed.
-
公开(公告)号:US20210036877A1
公开(公告)日:2021-02-04
申请号:US16938156
申请日:2020-07-24
Applicant: NVIDIA Corporation
Inventor: Benjamin Klenk , Nan Jiang , Larry Robert Dennison , Gregory M. Thorson
IPC: H04L12/18 , H04L12/741
Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.
-
公开(公告)号:US20240137410A1
公开(公告)日:2024-04-25
申请号:US18545339
申请日:2023-12-19
Applicant: NVIDIA Corporation
Inventor: Glenn Dearth , Mark Hummel , Nan Jiang , Gregory Thorson
IPC: H04L67/1008 , H04L47/70 , H04L47/80 , H04L67/1014
CPC classification number: H04L67/1008 , H04L47/806 , H04L47/827 , H04L67/1014
Abstract: Systems and techniques for performing multicast-reduction operations. In at least one embodiment, a network device receives first network data associated with a multicast operation to be collectively performed by at least a plurality of endpoints. The network device reserves resources to process second network data to be received from the endpoints, and sends the first network data to a plurality of additional network devices. The network device receives the second network data, and processes the second network data using the reserved resources.
-
公开(公告)号:US20220417176A1
公开(公告)日:2022-12-29
申请号:US17848088
申请日:2022-06-23
Applicant: NVIDIA Corporation
Inventor: Glenn Alan Dearth , Nan Jiang , Mark D. Hummel , Gregory Michael Thorson , Karan Gupta , Dane Thomas Mrazek , Eric Anderson , Larry Robert Dennison
IPC: H04L49/101 , H04L49/201 , H04L45/00 , H04L45/24 , H04L45/30 , H04L45/80
Abstract: A method is provided for operating a network switch comprising a plurality of input ports and a plurality of output ports. The method comprises receiving a first data packet received via a first input port and a second data packet received via a second input port to be delivered to an egress endpoint connected to a first output port, configuring a plurality of crossbar switch units arranged in a tiled architecture to pass the first data packet to the first output port via a primary path and pass the second data packet to the first output port via a secondary path, and transmitting the first data packet and the second data packet to the egress endpoint. The first data packet and the second data packet pass through the plurality of crossbar switch units simultaneously.
-
公开(公告)号:US20210037107A1
公开(公告)日:2021-02-04
申请号:US16938097
申请日:2020-07-24
Applicant: NVIDIA Corporation
Inventor: Benjamin Klenk , Nan Jiang , Larry Robert Dennison , Gregory M. Thorson
IPC: H04L29/08 , H04L12/18 , H04L12/741
Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.
-
-
-
-
-