-
公开(公告)号:US20240069736A1
公开(公告)日:2024-02-29
申请号:US17900808
申请日:2022-08-31
Applicant: NVIDIA CORPORATION
Inventor: Srinivas Santosh Kumar MADUGULA , Olivier GIROUX , Wishwesh Anil GANDHI , Michael Allen PARKER , Raghuram L , Ivan TANASIC , Manan PATEL , Mark HUMMEL , Alexander L. MINKIN
IPC: G06F3/06
CPC classification number: G06F3/0611 , G06F3/0659 , G06F3/0673
Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a multiprocessor computing system. During a remote memory operation in the multiprocessor computing system, a source processing unit transmits multiple segments of data to a destination processing. For each segment of data, the source processing unit transmits a remote memory operation to the destination processing unit that includes associated metadata that identifies the memory location of a corresponding synchronization object. The remote memory operation along with the metadata is transmitted as a single unit to the destination processing unit. The destination processing unit splits the operation into the remote memory operation and the memory synchronization operation. As a result, the source processing unit avoids the need to perform a separate memory synchronization operation, thereby reducing inter-processor communications and increasing performance of remote memory operations.
-
公开(公告)号:US20240354106A1
公开(公告)日:2024-10-24
申请号:US18755097
申请日:2024-06-26
Applicant: NVIDIA Corporation
Inventor: Srinivas Santosh Kumar MADUGULA , Olivier GIROUX , Wishwesh Anil GANDHI , Michael Allen PARKER , Raghuram L , Ivan TANASIC , Manan PATEL , Mark HUMMEL , Alexander L. MINKIN , Gregory Michael THORSON
IPC: G06F9/30
CPC classification number: G06F9/30043 , G06F9/30087
Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a data center or multiprocessor computing system. During a remote memory operation, a source processor transmits multiple data segments to a destination processor. For each data segment, the source processor transmits a remote memory operation to the destination processor that includes associated metadata that identifies the memory location of a corresponding synchronization object representing a count of data segments to be stored or a flag for each data segment to be stored. The remote memory operation along with the metadata is transmitted as a single unit to the destination processor. The destination processor splits the operation into the remote memory operation and the memory synchronization operation. As a result, the source processor avoids the need to perform a separate memory synchronization operation, thereby reducing inter-processor communications and increasing performance of remote memory operations.
-
公开(公告)号:US20240393951A1
公开(公告)日:2024-11-28
申请号:US18768983
申请日:2024-07-10
Applicant: NVIDIA Corporation
Inventor: Srinivas Santosh Kumar MADUGULA , Olivier GIROUX , Wishwesh Anil GANDHI , Michael Allen PARKER , Raghuram L , Ivan TANASIC , Manan PATEL , Mark HUMMEL , Alexander L. MINKIN
IPC: G06F3/06
Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a multiprocessor computing system. During a remote memory operation in the multiprocessor computing system, a source processing unit transmits multiple segments of data to a destination processing. For each segment of data, the source processing unit transmits a remote memory operation to the destination processing unit that includes associated metadata that identifies the memory location of a corresponding synchronization object. The remote memory operation along with the metadata is transmitted as a single unit to the destination processing unit. The destination processing unit splits the operation into the remote memory operation and the memory synchronization operation. As a result, the source processing unit avoids the need to perform a separate memory synchronization operation, thereby reducing inter-processor communications and increasing performance of remote memory operations.
-
公开(公告)号:US20190294575A1
公开(公告)日:2019-09-26
申请号:US16364565
申请日:2019-03-26
Applicant: NVIDIA Corporation
Inventor: Larry R. DENNISON , Mark HUMMEL , Glenn DEARTH
IPC: G06F13/40 , G06F12/0891 , G06F9/38
Abstract: Systems and techniques for synchronizing transactions between processing devices on an interconnection network are provided. Upon receiving a stream of posted transactions followed by a flush transaction from a source processing device connected to the interconnection network, the flush transaction is trapped before it enters the interconnecting network. Subsequently, based on monitoring for responses received from a destination processing device for transactions corresponding to the posted transactions, a flush response is generated and returned to the source processing device. The described techniques enable efficient synchronizing posted writes, posted atomics and the like over complex interconnection fabrics such that a first GPU can write data to a second GPU so that a third GPU can safely consume the data written to the second GPU.
-
-
-