Network packet templating for GPU-initiated communication

    公开(公告)号:US10740163B2

    公开(公告)日:2020-08-11

    申请号:US16022498

    申请日:2018-06-28

    Abstract: Systems, apparatuses, and methods for performing network packet templating for graphics processing unit (GPU)-initiated communication are disclosed. A central processing unit (CPU) creates a network packet according to a template and populates a first subset of fields of the network packet with static data. Next, the CPU stores the network packet in a memory. A GPU initiates execution of a kernel and detects a network communication request within the kernel and prior to the kernel completing execution. Responsive to this determination, the GPU populates a second subset of fields of the network packet with runtime data. Then, the GPU generates a notification that the network packet is ready to be processed. A network interface controller (NIC) processes the network packet using data retrieved from the first subset of fields and from the second subset of fields responsive to detecting the notification.

    NETWORK PACKET TEMPLATING FOR GPU-INITIATED COMMUNICATION

    公开(公告)号:US20200004610A1

    公开(公告)日:2020-01-02

    申请号:US16022498

    申请日:2018-06-28

    Abstract: Systems, apparatuses, and methods for performing network packet templating for graphics processing unit (GPU)-initiated communication are disclosed. A central processing unit (CPU) creates a network packet according to a template and populates a first subset of fields of the network packet with static data. Next, the CPU stores the network packet in a memory. A GPU initiates execution of a kernel and detects a network communication request within the kernel and prior to the kernel completing execution. Responsive to this determination, the GPU populates a second subset of fields of the network packet with runtime data. Then, the GPU generates a notification that the network packet is ready to be processed. A network interface controller (NIC) processes the network packet using data retrieved from the first subset of fields and from the second subset of fields responsive to detecting the notification.

    Multi-Tree Reduction with Execution Skew
    15.
    发明公开

    公开(公告)号:US20240311182A1

    公开(公告)日:2024-09-19

    申请号:US18185641

    申请日:2023-03-17

    CPC classification number: G06F9/4881

    Abstract: A device includes a communication scheduler to generate schedule trees for scheduling data communication among a plurality of nodes configured to perform a collective operation using data contributed from the plurality of nodes. The device includes data reduction logic to: identify one or more skewed nodes among the plurality of nodes, perform, according to a first set of schedule trees, a first operation to generate partial results based on data contributed from non-skewed nodes, and perform, according to a second set of schedule trees, a second operation to generate final results based on the partial results and data contributed from the one or more skewed nodes.

    DISTRIBUTED CACHING POLICY FOR LARGE-SCALE DEEP LEARNING TRAINING DATA PRE-PROCESSING

    公开(公告)号:US20240211399A1

    公开(公告)日:2024-06-27

    申请号:US18089480

    申请日:2022-12-27

    CPC classification number: G06F12/0813 G06N20/00

    Abstract: A distributed cache network used for machine learning is provided which comprises a network fabric having file systems which store data and a plurality of processing devices, each comprising cache memory and a processor configured to execute a training of a machine learning model and selectively cache portions of the data based on a frequency with which the data is accessed by the processor. Each processing device stores metadata identifying portions of data which are cached in the cache memory and other portions of the data which are cached in other processing devices of the network. When requested data is not cached in another processing device, the portion of requested data is accessed from a network file system via a client to server channel and is accessed from another processing device via a client to client channel when the requested data is cached in the other processing device.

    GPU NETWORKING USING AN INTEGRATED COMMAND PROCESSOR

    公开(公告)号:US20230120934A1

    公开(公告)日:2023-04-20

    申请号:US18068836

    申请日:2022-12-20

    Abstract: Systems, apparatuses, and methods for generating network messages on a parallel processor are disclosed. A system includes at least a parallel processor, a general purpose processor, and a network interface unit. The parallel processor includes at least a plurality of compute units, a command processor, and a cache. A thread within a kernel executing on a compute unit of the parallel processor generates a network message and stores the network message and a corresponding indication in the cache. In response to detecting the indication of the network message in the cache, the command processor processes and conveys the network message to the network interface unit without involving the general purpose processor.

    GPU networking using an integrated command processor

    公开(公告)号:US11544121B2

    公开(公告)日:2023-01-03

    申请号:US15815043

    申请日:2017-11-16

    Abstract: Systems, apparatuses, and methods for generating network messages on a parallel processor are disclosed. A system includes at least a parallel processor, a general purpose processor, and a network interface unit. The parallel processor includes at least a plurality of compute units, a command processor, and a cache. A thread within a kernel executing on a compute unit of the parallel processor generates a network message and stores the network message and a corresponding indication in the cache. In response to detecting the indication of the network message in the cache, the command processor processes and conveys the network message to the network interface unit without involving the general purpose processor.

Patent Agency Ranking