Triggered operations for collective communication

    公开(公告)号:US11409673B2

    公开(公告)日:2022-08-09

    申请号:US16275625

    申请日:2019-02-14

    Abstract: Examples include a method of managing storage for triggered operations. The method includes receiving a request to allocate a triggered operation; if there is a free triggered operation, allocating the free triggered operation; if there is no free triggered operation, recovering one or more fired triggered operations, freeing one or more of the recovered triggered operations, and allocating one of the freed triggered operations; configuring the allocated triggered operation; and storing the configured triggered operation in a cache on an input/output (I/O) device for subsequent asynchronous execution of the configured triggered operation.

    Overlapped rendezvous memory registration

    公开(公告)号:US11150967B2

    公开(公告)日:2021-10-19

    申请号:US15721854

    申请日:2017-09-30

    Abstract: Methods, software, and systems for improved data transfer operations using overlapped rendezvous memory registration. Techniques are disclosed for transferring data between a first process operating as a sender and a second process operating as a receiver. The sender sends a PUT request message to the receiver including payload data stored in a send buffer and first and second match indicia. The first match indicia is used to determine whether the PUT request is expected or unexpected. If the PUT request is unexpected, an RMA GET operation is performed using the second matching indicia to pull data from the send buffer and write the data to a memory region in the user space of the process associated with the receiver. If the PUT request message is expected, the data payload with the PUT request is written to a receive buffer on the receiver determined using the first match indicia.

    COLLECTIVE COMMUNICATION OPERATION
    7.
    发明申请

    公开(公告)号:US20180183857A1

    公开(公告)日:2018-06-28

    申请号:US15390234

    申请日:2016-12-23

    CPC classification number: H04L67/10 H04L67/42

    Abstract: Particular embodiments described herein provide for an electronic device that can be configured to consolidate data from one or more processes on a node, where the node is part of a first collection of nodes, communicate the consolidated data to a second node, where the second node is in the first collection of nodes, where the first collection of nodes is part of a first group of a collection of nodes, and communicate the consolidated data to a third node, wherein the third node is in a second collection of nodes, where the second collection of nodes is part of the first group of the collection of nodes. In an example, the node is part of a multi-tiered dragonfly topology network and the data is part of a gather or scatter process.

    ALGORITHMS FOR OPTIMIZING SMALL MESSAGE COLLECTIVES WITH HARDWARE SUPPORTED TRIGGERED OPERATIONS

    公开(公告)号:US20210271536A1

    公开(公告)日:2021-09-02

    申请号:US17133559

    申请日:2020-12-23

    Abstract: Algorithms for optimizing small message collectives with hardware supported triggered operations and associated methods, apparatus, and systems. The algorithms are implemented in a distributed compute environment comprising a plurality of ranks including a root, a plurality of intermediate nodes, and a plurality of leaf nodes, where each of the plurality of ranks comprising a compute platform having a communication interface including embedded logic for implementing the algorithms. Collectives are employed to transfer data between parent ranks and child ranks. In connection with the collectives, control messages are sent from children of a collective to the parent of the collective informing the parent that the children of the collective have free buffers ready to receive data. The parent employs a counter to determine that a control message has been received from each of its children indicating each child has a free buffer prior to sending data to the children in the collective.

    Technologies for fine-grained completion tracking of memory buffer accesses

    公开(公告)号:US10963183B2

    公开(公告)日:2021-03-30

    申请号:US15463005

    申请日:2017-03-20

    Abstract: Technologies for fine-grained completion tracking of memory buffer accesses include a compute device. The compute device is to establish multiple counter pairs for a memory buffer. Each counter pair includes a locally managed offset and a completion counter. The compute device is also to receive a request from a remote compute device to access the memory buffer, assign one of the counter pairs to the request, advance the locally managed offset of the assigned counter pair by the amount of data to be read or written, and advance the completion counter of the assigned counter pair as the data is read from or written to the memory buffer. Other embodiments are also described and claimed.

Patent Agency Ranking