-
公开(公告)号:US12190405B2
公开(公告)日:2025-01-07
申请号:US17853711
申请日:2022-06-29
Applicant: Intel Corporation
Inventor: Todd Rimmer , Mark Debbage , Bruce G. Warren , Sayantan Sur , Nayan Amrutlal Suthar , Ajaya Durg
Abstract: Examples described herein relate to a first graphics processing unit (GPU) with at least one integrated communications system, wherein the at least one integrated communications system is to apply a reliability protocol to communicate with a second at least one integrated communications system associated with a second GPU to copy data from a first memory region to a second memory region and wherein the first memory region is associated with the first GPU and the second memory region is associated with the second GPU.
-
公开(公告)号:US11409673B2
公开(公告)日:2022-08-09
申请号:US16275625
申请日:2019-02-14
Applicant: Intel Corporation
Inventor: Andrew Friedley , Sayantan Sur , Ravindra Babu Ganapathi , Travis Hamilton , Keith D. Underwood
IPC: G06F13/38 , G06F13/16 , G06F9/48 , G06F12/0802
Abstract: Examples include a method of managing storage for triggered operations. The method includes receiving a request to allocate a triggered operation; if there is a free triggered operation, allocating the free triggered operation; if there is no free triggered operation, recovering one or more fired triggered operations, freeing one or more of the recovered triggered operations, and allocating one of the freed triggered operations; configuring the allocated triggered operation; and storing the configured triggered operation in a cache on an input/output (I/O) device for subsequent asynchronous execution of the configured triggered operation.
-
公开(公告)号:US10693787B2
公开(公告)日:2020-06-23
申请号:US15686264
申请日:2017-08-25
Applicant: INTEL CORPORATION
Inventor: Timo Schneider , Keith D. Underwood , Mario Flajslik , Sayantan Sur , James Dinan
IPC: H04L12/801 , G06F9/48 , H04L12/935 , H04L12/803 , H04L29/08
Abstract: Techniques are disclosed to throttle bandwidth imbalanced data transfers. In some examples, an example computer-implemented method may include splitting a payload of a data transfer operation over a network fabric into multiple chunk get operations, starting the execution of a threshold number of the chunk get operations, and scheduling the remaining chunk get operations for subsequent execution. The method may also include executing a scheduled chunk get operation in response determining a completion of an executing chunk get operation. In some embodiments, the chunk get operations may be implemented as triggered operations.
-
公开(公告)号:US09811403B1
公开(公告)日:2017-11-07
申请号:US15189103
申请日:2016-06-22
Applicant: Intel Corporation
Inventor: Sayantan Sur
CPC classification number: G06F9/546 , G06F9/544 , G06F2209/548
Abstract: In one embodiment, an apparatus includes: a plurality of queues having a plurality of first entries to store receive information for a process; a master queue having a plurality of second entries to store wild card receive information, where redundant information of the plurality of second entries is to be included in a plurality of redundant entries of the plurality of queues; and a control circuit to match an incoming receive operation within one of the plurality of queues. Other embodiments are described and claimed.
-
公开(公告)号:US11246027B2
公开(公告)日:2022-02-08
申请号:US15280439
申请日:2016-09-29
Applicant: Intel Corporation
Inventor: William R. Magro , Todd M. Rimmer , Robert J. Woodruff , Mark S. Hefty , Sayantan Sur
IPC: H04L12/931 , H04W12/06 , G06F9/448 , H04L29/06 , G06F9/50 , H04L12/933 , H04L29/12
Abstract: In an embodiment, at least one interface mechanism may be provided. The mechanism may permit, at least in part, at least one process allocate, at least in part, and/or configure, at least in part, at least one network-associated object. Such allocation and/or configuration, at least in part, may be in accordance with at least one parameter set that may correspond, at least in part, to at least one query issued by the at least one process via the mechanism. Many modifications are possible without departing from this embodiment.
-
公开(公告)号:US11150967B2
公开(公告)日:2021-10-19
申请号:US15721854
申请日:2017-09-30
Applicant: Intel Corporation
Inventor: Sayantan Sur , Keith Underwood , Ravindra Babu Ganapathi , Andrew Friedley
Abstract: Methods, software, and systems for improved data transfer operations using overlapped rendezvous memory registration. Techniques are disclosed for transferring data between a first process operating as a sender and a second process operating as a receiver. The sender sends a PUT request message to the receiver including payload data stored in a send buffer and first and second match indicia. The first match indicia is used to determine whether the PUT request is expected or unexpected. If the PUT request is unexpected, an RMA GET operation is performed using the second matching indicia to pull data from the send buffer and write the data to a memory region in the user space of the process associated with the receiver. If the PUT request message is expected, the data payload with the PUT request is written to a receive buffer on the receiver determined using the first match indicia.
-
公开(公告)号:US20180183857A1
公开(公告)日:2018-06-28
申请号:US15390234
申请日:2016-12-23
Applicant: Intel Corporation
Inventor: Akhil Langer , Sayantan Sur
Abstract: Particular embodiments described herein provide for an electronic device that can be configured to consolidate data from one or more processes on a node, where the node is part of a first collection of nodes, communicate the consolidated data to a second node, where the second node is in the first collection of nodes, where the first collection of nodes is part of a first group of a collection of nodes, and communicate the consolidated data to a third node, wherein the third node is in a second collection of nodes, where the second collection of nodes is part of the first group of the collection of nodes. In an example, the node is part of a multi-tiered dragonfly topology network and the data is part of a gather or scatter process.
-
公开(公告)号:US20220210639A1
公开(公告)日:2022-06-30
申请号:US17548237
申请日:2021-12-10
Applicant: Intel Corporation
Inventor: William R. Magro , Todd M. Rimmer , Robert J. Woodruff , Mark S. Hefty , Sayantan Sur
Abstract: In an embodiment, at least one interface mechanism may be provided. The mechanism may permit, at least in part, at least one process allocate, at least in part, and/or configure, at least in part, at least one network-associated object. Such allocation and/or configuration, at least in part, may be in accordance with at least one parameter set that may correspond, at least in part, to at least one query issued by the at least one process via the mechanism. Many modifications are possible without departing from this embodiment.
-
9.
公开(公告)号:US20210271536A1
公开(公告)日:2021-09-02
申请号:US17133559
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Maria Garzaran , Nusrat Islam , Gengbin Zheng , Sayantan Sur
Abstract: Algorithms for optimizing small message collectives with hardware supported triggered operations and associated methods, apparatus, and systems. The algorithms are implemented in a distributed compute environment comprising a plurality of ranks including a root, a plurality of intermediate nodes, and a plurality of leaf nodes, where each of the plurality of ranks comprising a compute platform having a communication interface including embedded logic for implementing the algorithms. Collectives are employed to transfer data between parent ranks and child ranks. In connection with the collectives, control messages are sent from children of a collective to the parent of the collective informing the parent that the children of the collective have free buffers ready to receive data. The parent employs a counter to determine that a control message has been received from each of its children indicating each child has a free buffer prior to sending data to the children in the collective.
-
公开(公告)号:US10963183B2
公开(公告)日:2021-03-30
申请号:US15463005
申请日:2017-03-20
Applicant: Intel Corporation
Inventor: James Dinan , Keith D. Underwood , Sayantan Sur , Charles A. Giefer , Mario Flajslik
Abstract: Technologies for fine-grained completion tracking of memory buffer accesses include a compute device. The compute device is to establish multiple counter pairs for a memory buffer. Each counter pair includes a locally managed offset and a completion counter. The compute device is also to receive a request from a remote compute device to access the memory buffer, assign one of the counter pairs to the request, advance the locally managed offset of the assigned counter pair by the amount of data to be read or written, and advance the completion counter of the assigned counter pair as the data is read from or written to the memory buffer. Other embodiments are also described and claimed.
-
-
-
-
-
-
-
-
-