-
公开(公告)号:US20240036944A1
公开(公告)日:2024-02-01
申请号:US17955143
申请日:2022-09-28
Applicant: NVIDIA Corporation
Inventor: Ze Long , Kyrylo Perelygin , Harold Carter Edwards , Gokul Ramaswamy Hirisave Chandra Shekhara , Jaydeep Marathe , Ronny Meir Krashinsky , Girish Bhaskarrao Bharambe
Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate whether one or more threads within two or more blocks of threads have performed a barrier instruction.
-
公开(公告)号:US20220413945A1
公开(公告)日:2022-12-29
申请号:US17366770
申请日:2021-07-02
Applicant: NVIDIA Corporation
Inventor: Piotr Ciolkosz , Kyrylo Perelygin , Harold Carter Edwards , Wesley Maxey
Abstract: Apparatuses, systems, and techniques to implement a barrier operation. In at least one embodiment, a memory barrier operation causes accesses to memory by a plurality of groups of threads to occur in an order indicated by the memory barrier operation.
-
公开(公告)号:US20220365882A1
公开(公告)日:2022-11-17
申请号:US17395255
申请日:2021-08-05
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Luke David Durant , Stephen Jones , Jack H. Choquette , Ronny Krashinsky , Dmitri Vainbrand , Olivier Giroux , Olivier Francois Joseph Harel , Shirish Gadre , Ze Long , Matthieu Tardy , David Dastous St Hilaire , Gokul Ramaswamy Hirisave Chandra Shekhara , Jaydeep Marathe , Jaewook Shin , Jayashree Venkatesh , Girish Bhaskar Bharambe
IPC: G06F12/0895
Abstract: Apparatuses, systems, and techniques to control operation of a memory cache. In at least one embodiment, cache guidance is specified within application source code by associating guidance with declaration of a memory block, and then applying specified guidance to source code statements that access said memory block.
-
公开(公告)号:US20220244986A1
公开(公告)日:2022-08-04
申请号:US17671490
申请日:2022-02-14
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards
Abstract: Apparatuses, systems, and techniques to parallelize operations in one or more programs with data copies from global memory to shared memory in each of the one or more programs. In at least one embodiment, a program performs operations on shared data and then asynchronously copies shared data to shared memory, and continues performing additional operations in parallel while the shared data is copied to shared memory until an indicator provided by an application programming interface to facilitate parallel computing, such as CUDA, informs said program that shared data has been copied to shared memory.
-
-
-