-
公开(公告)号:US20240069736A1
公开(公告)日:2024-02-29
申请号:US17900808
申请日:2022-08-31
申请人: NVIDIA CORPORATION
发明人: Srinivas Santosh Kumar MADUGULA , Olivier GIROUX , Wishwesh Anil GANDHI , Michael Allen PARKER , Raghuram L , Ivan TANASIC , Manan PATEL , Mark HUMMEL , Alexander L. MINKIN
IPC分类号: G06F3/06
CPC分类号: G06F3/0611 , G06F3/0659 , G06F3/0673
摘要: Various embodiments include techniques for performing self-synchronizing remote memory operations in a multiprocessor computing system. During a remote memory operation in the multiprocessor computing system, a source processing unit transmits multiple segments of data to a destination processing. For each segment of data, the source processing unit transmits a remote memory operation to the destination processing unit that includes associated metadata that identifies the memory location of a corresponding synchronization object. The remote memory operation along with the metadata is transmitted as a single unit to the destination processing unit. The destination processing unit splits the operation into the remote memory operation and the memory synchronization operation. As a result, the source processing unit avoids the need to perform a separate memory synchronization operation, thereby reducing inter-processor communications and increasing performance of remote memory operations.
-
公开(公告)号:US20240354106A1
公开(公告)日:2024-10-24
申请号:US18755097
申请日:2024-06-26
申请人: NVIDIA Corporation
发明人: Srinivas Santosh Kumar MADUGULA , Olivier GIROUX , Wishwesh Anil GANDHI , Michael Allen PARKER , Raghuram L , Ivan TANASIC , Manan PATEL , Mark HUMMEL , Alexander L. MINKIN , Gregory Michael THORSON
IPC分类号: G06F9/30
CPC分类号: G06F9/30043 , G06F9/30087
摘要: Various embodiments include techniques for performing self-synchronizing remote memory operations in a data center or multiprocessor computing system. During a remote memory operation, a source processor transmits multiple data segments to a destination processor. For each data segment, the source processor transmits a remote memory operation to the destination processor that includes associated metadata that identifies the memory location of a corresponding synchronization object representing a count of data segments to be stored or a flag for each data segment to be stored. The remote memory operation along with the metadata is transmitted as a single unit to the destination processor. The destination processor splits the operation into the remote memory operation and the memory synchronization operation. As a result, the source processor avoids the need to perform a separate memory synchronization operation, thereby reducing inter-processor communications and increasing performance of remote memory operations.
-
公开(公告)号:US20200327010A1
公开(公告)日:2020-10-15
申请号:US16384614
申请日:2019-04-15
申请人: NVIDIA CORPORATION
发明人: Ashutosh PANDEY , Jay GUPTA , Kaushal AGARWAL , Justin BENNETT , Srinivas Santosh Kumar MADUGULA
摘要: Techniques are disclosed for reducing the time required to read and write data to memory. Data reads and/or writes can be delayed when error correction code (ECC) bits, which are used to detect and/or correct data corruption, are written to memory. Writing ECC bits can take longer in some instances than writing data bits because an ECC write may involve a read/modify/write operation, as opposed to just simply writing the bits to memory. Some latencies associated with writing ECC bits can be hidden by interleaving ECC writes with data writes. However, if insufficient data writes are available for interleaving, hiding such latencies become difficult. Thus, various techniques are disclosed, for example, where ECC writes are deferred until a sufficient number of data writes become available for interleaving. By interleaving ECC writes, the disclosed techniques decrease the overall time required to read and write data to memory.
-
-