-
公开(公告)号:US20240143198A1
公开(公告)日:2024-05-02
申请号:US17976728
申请日:2022-10-28
Applicant: Hewlett Packard Enterprise Development LP
Inventor: Duncan Roweth , Robert L. Alverson , Nathan L. Wichmann , Eric P. Lundberg
IPC: G06F3/06
CPC classification number: G06F3/0625 , G06F3/0659 , G06F3/067
Abstract: A network interface card (NIC) receives a stream of commands, a respective command comprising memory-operation requests, each request associated with a destination NIC. The NIC buffers asynchronously the requests into queues based on the destination NIC, each queue specific to a corresponding destination NIC. When first queue requests reach a threshold, the NIC aggregates the first queue requests into a first packet and sends the first packet to the destination NIC. The NIC receives a plurality of packets, a second packet comprising memory-operation requests, each request associated with a same destination NIC and a destination core. The NIC buffers asynchronously the requests of the second packet into queues based on the destination core, each queue specific to a corresponding destination core. When second queue requests reach the threshold, the NIC aggregates the second queue requests into a third packet and sends the third packet to the destination core.
-
公开(公告)号:US20230359574A1
公开(公告)日:2023-11-09
申请号:US18353277
申请日:2023-07-17
Applicant: Hewlett Packard Enterprise Development LP
Inventor: Keith D. Underwood , Robert L. Alverson , Duncan Roweth , Nathan L. Wichmann
CPC classification number: G06F13/20 , G06F12/10 , G06F2212/1024
Abstract: An apparatus is provided that includes a network interface to transmit and receive data packets over a network; a memory including one or more buffers; an arithmetic logic unit to perform arithmetic operations for organizing and combining the data packets; and a circuitry to receive, via the network interface, data packets from the network; aggregate, via the arithmetic logic unit, the received data packets in the one or more buffers at a network rate; and transmit, via the network interface, the aggregated data packets to one or more compute nodes in the network, thereby optimizing latency incurred in combining the received data packets and transmitting the aggregated data packets, and hence accelerating a bulk data allreduce operation. One embodiment provides a system and method for performing the allreduce operation. During operation, the system performs the allreduce operation by pacing network operations for enhancing performance of the allreduce operation.
-
公开(公告)号:US11714765B2
公开(公告)日:2023-08-01
申请号:US17383606
申请日:2021-07-23
Applicant: Hewlett Packard Enterprise Development LP
Inventor: Keith D. Underwood , Robert L. Alverson , Duncan Roweth , Nathan L. Wichmann
CPC classification number: G06F13/20 , G06F12/10 , G06F2212/1024
Abstract: An apparatus is provided that includes a network interface to transmit and receive data packets over a network; a memory including one or more buffers; an arithmetic logic unit to perform arithmetic operations for organizing and combining the data packets; and a circuitry to receive, via the network interface, data packets from the network; aggregate, via the arithmetic logic unit, the received data packets in the one or more buffers at a network rate; and transmit, via the network interface, the aggregated data packets to one or more compute nodes in the network, thereby optimizing latency incurred in combining the received data packets and transmitting the aggregated data packets, and hence accelerating a bulk data allreduce operation. One embodiment provides a system and method for performing the allreduce operation. During operation, the system performs the allreduce operation by pacing network operations for enhancing performance of the allreduce operation.
-
公开(公告)号:US20230035657A1
公开(公告)日:2023-02-02
申请号:US17383606
申请日:2021-07-23
Applicant: Hewlett Packard Enterprise Development LP
Inventor: Keith D. Underwood , Robert L. Alverson , Duncan Roweth , Nathan L. Wichmann
Abstract: An apparatus is provided that includes a network interface to transmit and receive data packets over a network; a memory including one or more buffers; an arithmetic logic unit to perform arithmetic operations for organizing and combining the data packets; and a circuitry to receive, via the network interface, data packets from the network; aggregate, via the arithmetic logic unit, the received data packets in the one or more buffers at a network rate; and transmit, via the network interface, the aggregated data packets to one or more compute nodes in the network, thereby optimizing latency incurred in combining the received data packets and transmitting the aggregated data packets, and hence accelerating a bulk data allreduce operation. One embodiment provides a system and method for performing the allreduce operation. During operation, the system performs the allreduce operation by pacing network operations for enhancing performance of the allreduce operation.
-
-
-