RESOURCE EXHAUSTION RECOVERY IN ORDERED NETWORKS

    公开(公告)号:US20250165307A1

    公开(公告)日:2025-05-22

    申请号:US18644452

    申请日:2024-04-24

    Abstract: Techniques for managing resource exhaustion in message-passing communication within a computing system are disclosed. A method, executed by a destination compute node, involves provisioning additional resources for a processing table entry in a message processing table when the entry's resources are exhausted. The method includes incrementing a generation number for the processing table entry from a previous value to a current value. A first message from a source compute node that includes the previous value of the generation number is rejected, while the same message is accepted when it includes the current value of the generation number, thereby facilitating ordered message-passing and recovery from resource exhaustion.

    DATATYPE ENGINE TO SUPPORT HIGH PERFORMANCE COMPUTING

    公开(公告)号:US20240143180A1

    公开(公告)日:2024-05-02

    申请号:US17976721

    申请日:2022-10-28

    CPC classification number: G06F3/0613 G06F3/0659 G06F3/067

    Abstract: A method and apparatus are provided for facilitating a datatype engine (DTE) to support high performance computing. A network interface card (NIC) receives, via a message passing interface, a command to read data from a host memory. The NIC determines that the command indicates a first datatype descriptor stored in the NIC. The NIC forms, based on the command, a packet which indicates a base address and a length associated with the data to be read from the host memory and passes the packet to the DTE. The DTE generates a plurality of read requests comprising offsets from the base address and corresponding lengths based on the first datatype descriptor. The DTE passes the plurality of read requests to a direct memory access module, thereby allowing the NIC to access the host memory while eliminating copies of the data on the host during transfer of the command across a network.

    RENDEZVOUS TO ENABLE CONGESTION MANAGEMENT
    3.
    发明公开

    公开(公告)号:US20240121294A1

    公开(公告)日:2024-04-11

    申请号:US18478531

    申请日:2023-09-29

    CPC classification number: H04L67/1004 G06F15/17331

    Abstract: A network interface controller (NIC) facilitating incast management at a computing system is provided. During operation, the NIC can receive, via a network, a request to send data from a remote computing system. The NIC can determine that the request is among a plurality of requests from a plurality of remote computing systems accessible via the network. Based on a descriptor in the request, the NIC can determine a storage location of the data at the remote computing system. The NIC can then determine a level of congestion associated with the plurality of requests at the computing system. The NIC can schedule a data retrieval in response to the request based on the level of congestion and with respect to the plurality of requests. The NIC can then retrieve the data from the storage location based on remote access.

    DECOUPLING CONGESTION MANAGEMENT STATE AND CONNECTION MANAGEMENT STATE IN HIGH PERFORMANCE COMPUTING

    公开(公告)号:US20250106161A1

    公开(公告)日:2025-03-27

    申请号:US18408288

    申请日:2024-01-09

    Abstract: A first network endpoint establishes a connection with a second network endpoint by transmitting a control packet including a first identifier associated with the connection and the first network endpoint. The first network endpoint stores, in a first data structure based on the first identifier, a first connection state associated with the connection and stores, in a second data structure based on the first connection state, a first congestion state associated with the connection. The first network endpoint identifies, for a data flow associated with the first identifier, a congestion state corresponding to the data flow, by: obtaining the first connection state by searching the first data structure based on the first identifier; and identifying the first congestion state by searching the second data structure based on the obtained first connection state.

    Mechanism to enable out-of-order packet processing in a datatype engine

    公开(公告)号:US12229048B2

    公开(公告)日:2025-02-18

    申请号:US18085092

    申请日:2022-12-20

    Abstract: A network interface card (NIC) receives packets corresponding to a read or write request, the packets associated with a datatype descriptor stored in a datatype engine of the NIC, and each packet associated with a precomputed context which indicates a value for each dimension of a multi-dimensional array and a start location of the respective packet within a host memory block. The NIC generates, for a respective packet, a datatype handle corresponding to the datatype descriptor and an offset indicating a position of the respective packet within the packets. The NIC determines, based on the datatype handle and the offset, a cached context for the respective packet and initializes the datatype engine based on the cached context. The datatype engine generates, based on the cached context, read or write requests comprising addresses and lengths, thereby allowing the NIC to process out-of-order packets based on the precomputed and cached context.

    MECHANISM TO ENABLE OUT-OF-ORDER PACKET PROCESSING IN A DATATYPE ENGINE

    公开(公告)号:US20250028640A1

    公开(公告)日:2025-01-23

    申请号:US18905555

    申请日:2024-10-03

    Abstract: A network interface card (NIC) receives packets corresponding to a read or write request, the packets associated with a datatype descriptor stored in a datatype engine of the NIC, and each packet associated with a precomputed context which indicates a value for each dimension of a multi-dimensional array and a start location of the respective packet within a host memory block. The NIC generates, for a respective packet, a datatype handle corresponding to the datatype descriptor and an offset indicating a position of the respective packet within the packets. The NIC determines, based on the datatype handle and the offset, a cached context for the respective packet and initializes the datatype engine based on the cached context. The datatype engine generates, based on the cached context, read or write requests comprising addresses and lengths, thereby allowing the NIC to process out-of-order packets based on the precomputed and cached context.

    SYSTEM AND METHOD FOR IMPLEMENTING A NETWORK-INTERFACE-BASED ALLREDUCE OPERATION

    公开(公告)号:US20230359574A1

    公开(公告)日:2023-11-09

    申请号:US18353277

    申请日:2023-07-17

    CPC classification number: G06F13/20 G06F12/10 G06F2212/1024

    Abstract: An apparatus is provided that includes a network interface to transmit and receive data packets over a network; a memory including one or more buffers; an arithmetic logic unit to perform arithmetic operations for organizing and combining the data packets; and a circuitry to receive, via the network interface, data packets from the network; aggregate, via the arithmetic logic unit, the received data packets in the one or more buffers at a network rate; and transmit, via the network interface, the aggregated data packets to one or more compute nodes in the network, thereby optimizing latency incurred in combining the received data packets and transmitting the aggregated data packets, and hence accelerating a bulk data allreduce operation. One embodiment provides a system and method for performing the allreduce operation. During operation, the system performs the allreduce operation by pacing network operations for enhancing performance of the allreduce operation.

    SYSTEM AND METHOD FOR IMPLEMENTING A NETWORK-INTERFACE-BASED ALLREDUCE OPERATION

    公开(公告)号:US20230035657A1

    公开(公告)日:2023-02-02

    申请号:US17383606

    申请日:2021-07-23

    Abstract: An apparatus is provided that includes a network interface to transmit and receive data packets over a network; a memory including one or more buffers; an arithmetic logic unit to perform arithmetic operations for organizing and combining the data packets; and a circuitry to receive, via the network interface, data packets from the network; aggregate, via the arithmetic logic unit, the received data packets in the one or more buffers at a network rate; and transmit, via the network interface, the aggregated data packets to one or more compute nodes in the network, thereby optimizing latency incurred in combining the received data packets and transmitting the aggregated data packets, and hence accelerating a bulk data allreduce operation. One embodiment provides a system and method for performing the allreduce operation. During operation, the system performs the allreduce operation by pacing network operations for enhancing performance of the allreduce operation.

    MECHANISM TO ENABLE OUT-OF-ORDER PACKET PROCESSING IN A DATATYPE ENGINE

    公开(公告)号:US20240202118A1

    公开(公告)日:2024-06-20

    申请号:US18085092

    申请日:2022-12-20

    CPC classification number: G06F12/0802 G06F2212/1016

    Abstract: A network interface card (NIC) receives packets corresponding to a read or write request, the packets associated with a datatype descriptor stored in a datatype engine of the NIC, and each packet associated with a precomputed context which indicates a value for each dimension of a multi-dimensional array and a start location of the respective packet within a host memory block. The NIC generates, for a respective packet, a datatype handle corresponding to the datatype descriptor and an offset indicating a position of the respective packet within the packets. The NIC determines, based on the datatype handle and the offset, a cached context for the respective packet and initializes the datatype engine based on the cached context. The datatype engine generates, based on the cached context, read or write requests comprising addresses and lengths, thereby allowing the NIC to process out-of-order packets based on the precomputed and cached context.

Patent Agency Ranking