AGGREGATING SMALL REMOTE MEMORY ACCESS REQUESTS

    公开(公告)号:US20240143198A1

    公开(公告)日:2024-05-02

    申请号:US17976728

    申请日:2022-10-28

    CPC classification number: G06F3/0625 G06F3/0659 G06F3/067

    Abstract: A network interface card (NIC) receives a stream of commands, a respective command comprising memory-operation requests, each request associated with a destination NIC. The NIC buffers asynchronously the requests into queues based on the destination NIC, each queue specific to a corresponding destination NIC. When first queue requests reach a threshold, the NIC aggregates the first queue requests into a first packet and sends the first packet to the destination NIC. The NIC receives a plurality of packets, a second packet comprising memory-operation requests, each request associated with a same destination NIC and a destination core. The NIC buffers asynchronously the requests of the second packet into queues based on the destination core, each queue specific to a corresponding destination core. When second queue requests reach the threshold, the NIC aggregates the second queue requests into a third packet and sends the third packet to the destination core.

    SYSTEM AND METHOD FOR IMPLEMENTING A NETWORK-INTERFACE-BASED ALLREDUCE OPERATION

    公开(公告)号:US20230359574A1

    公开(公告)日:2023-11-09

    申请号:US18353277

    申请日:2023-07-17

    CPC classification number: G06F13/20 G06F12/10 G06F2212/1024

    Abstract: An apparatus is provided that includes a network interface to transmit and receive data packets over a network; a memory including one or more buffers; an arithmetic logic unit to perform arithmetic operations for organizing and combining the data packets; and a circuitry to receive, via the network interface, data packets from the network; aggregate, via the arithmetic logic unit, the received data packets in the one or more buffers at a network rate; and transmit, via the network interface, the aggregated data packets to one or more compute nodes in the network, thereby optimizing latency incurred in combining the received data packets and transmitting the aggregated data packets, and hence accelerating a bulk data allreduce operation. One embodiment provides a system and method for performing the allreduce operation. During operation, the system performs the allreduce operation by pacing network operations for enhancing performance of the allreduce operation.

    System and method for implementing a network-interface-based allreduce operation

    公开(公告)号:US11714765B2

    公开(公告)日:2023-08-01

    申请号:US17383606

    申请日:2021-07-23

    CPC classification number: G06F13/20 G06F12/10 G06F2212/1024

    Abstract: An apparatus is provided that includes a network interface to transmit and receive data packets over a network; a memory including one or more buffers; an arithmetic logic unit to perform arithmetic operations for organizing and combining the data packets; and a circuitry to receive, via the network interface, data packets from the network; aggregate, via the arithmetic logic unit, the received data packets in the one or more buffers at a network rate; and transmit, via the network interface, the aggregated data packets to one or more compute nodes in the network, thereby optimizing latency incurred in combining the received data packets and transmitting the aggregated data packets, and hence accelerating a bulk data allreduce operation. One embodiment provides a system and method for performing the allreduce operation. During operation, the system performs the allreduce operation by pacing network operations for enhancing performance of the allreduce operation.

    SYSTEM AND METHOD FOR IMPLEMENTING A NETWORK-INTERFACE-BASED ALLREDUCE OPERATION

    公开(公告)号:US20230035657A1

    公开(公告)日:2023-02-02

    申请号:US17383606

    申请日:2021-07-23

    Abstract: An apparatus is provided that includes a network interface to transmit and receive data packets over a network; a memory including one or more buffers; an arithmetic logic unit to perform arithmetic operations for organizing and combining the data packets; and a circuitry to receive, via the network interface, data packets from the network; aggregate, via the arithmetic logic unit, the received data packets in the one or more buffers at a network rate; and transmit, via the network interface, the aggregated data packets to one or more compute nodes in the network, thereby optimizing latency incurred in combining the received data packets and transmitting the aggregated data packets, and hence accelerating a bulk data allreduce operation. One embodiment provides a system and method for performing the allreduce operation. During operation, the system performs the allreduce operation by pacing network operations for enhancing performance of the allreduce operation.

Patent Agency Ranking