-
公开(公告)号:US20230021678A1
公开(公告)日:2023-01-26
申请号:US17380424
申请日:2021-07-20
申请人: NVIDIA CORPORATION
发明人: Michael Allen PARKER , Debajit BHATTACHARYA , David FONTAINE , Shirish GADRE , Wishwesh Anil GANDHI , Olivier GIROUX , Hemayet HOSSAIN , Ronny M. KRASHINSKY , Ze LONG , Raymond Hoi Man WONG
摘要: Various embodiments include a parallel processing computer system that provides multiple memory synchronization domains in a single parallel processor to reduce unneeded synchronization operations. During execution, one execution kernel may synchronize with one or more other execution kernels by processing outstanding memory references. The parallel processor tracks memory references for each domain to each portion of local and remote memory. During synchronization, the processor synchronizes the memory references for a specific domain while refraining from synchronizing memory references for other domains. As a result, synchronization operations between kernels complete in a reduced amount of time relative to prior approaches.
-
公开(公告)号:US20230297426A1
公开(公告)日:2023-09-21
申请号:US17698664
申请日:2022-03-18
申请人: NVIDIA CORPORATION
发明人: Rajballav DASH , Stephen JONES , Jack Hilaire CHOQUETTE , Manan PATEL , Ronny M. KRASHINSKY , Shirish GADRE , Lixia QIN
CPC分类号: G06F9/5022 , G06F9/30098 , G06F9/3005 , G06F2209/5011
摘要: Various embodiments include techniques for utilizing resources on a processing unit. Thread groups executing on a processor begin execution with specified resources, such as a number of registers and an amount of shared memory. During execution, one or more thread groups may determine that the thread groups have excess resources needed to execute the current functions. Such thread groups can deallocate the excess resources to a free pool. Similarly, during execution, one or more thread groups may determine that the thread groups have fewer resources needed to execute the current functions. Such thread groups can allocate the needed resources from the free pool. Further, producer thread groups that generate data for consumer thread groups can deallocate excess resources prior to completion. The consumer thread groups can allocate the excess resources and initiate execution while the producer thread groups complete execution, thereby decreasing latency between producer and consumer thread groups.
-
公开(公告)号:US20230236878A1
公开(公告)日:2023-07-27
申请号:US17583957
申请日:2022-01-25
申请人: NVIDIA CORPORATION
发明人: Jack Hilaire CHOQUETTE , Rajballav DASH , Shayani DEB , Gentaro HIROTA , Ronny M. KRASHINSKY , Ze LONG , Chen MEI , Manan PATEL , Ming Y. SIU
IPC分类号: G06F9/48
CPC分类号: G06F9/4881
摘要: In various embodiments, scheduling dependencies associated with tasks executed on a processor are decoupled from data dependencies associated with the tasks. Before the completion of a first task that is executing in the processor, a scheduling dependency specifying that a second task is dependent on the first task is resolved based on a pre-exit trigger. In response to the resolution of the scheduling dependency, the second task is launched on the processor.
-
-