-
公开(公告)号:US20240168765A1
公开(公告)日:2024-05-23
申请号:US18086478
申请日:2022-12-21
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette
CPC classification number: G06F9/3802 , G06F9/30047
Abstract: Apparatuses, systems, and techniques to perform a tensor prefetch instruction to cause one or more tensors to be stored into one or more caches. In at least one embodiment, one or more circuits of a GPU are to perform a tensor prefetch instruction to cause one or more tensors to be stored into one or more GPU caches.
-
12.
公开(公告)号:US20240036956A1
公开(公告)日:2024-02-01
申请号:US17955163
申请日:2022-09-28
Applicant: NVIDIA Corporation
Inventor: Ze Long , Kyrylo Perelygin , Harold Carter Edwards , Gokul Ramaswamy Hirisave Chandra Shekhara , Jaydeep Marathe , Ronny Meir Krashinsky , Girish Bhaskarrao Bharambe
CPC classification number: G06F9/544 , G06F9/4881 , G06F9/30072
Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate whether one or more threads within a group of blocks of threads have performed a barrier instruction and to cause performance of one or more threads within the group of blocks of threads to stop at least until all threads within the group of blocks have performed the barrier instruction.
-
公开(公告)号:US20240036953A1
公开(公告)日:2024-02-01
申请号:US17955085
申请日:2022-09-28
Applicant: NVIDIA Corporation
Inventor: Ze Long , Kyrylo Perelygin , Harold Carter Edwards , Gokul Ramaswamy Hirisave Chandra Shekhara , Jaydeep Marathe , Ronny Meir Krashinsky , Girish Bhaskarrao Bharambe
CPC classification number: G06F9/544 , G06F9/4881
Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate a scheduling policy of one or more blocks of one or more threads.
-
公开(公告)号:US20240036918A1
公开(公告)日:2024-02-01
申请号:US17955123
申请日:2022-09-28
Applicant: NVIDIA Corporation
Inventor: Ze Long , Kyrylo Perelygin , Harold Carter Edwards , Gokul Ramaswamy Hirisave Chandra Shekhara , Jaydeep Marathe , Ronny Meir Krashinsky , Girish Bhaskarrao Bharambe
CPC classification number: G06F9/4881 , G06F9/545 , G06F8/456
Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to cause a kernel to be generated to cause two or more blocks of two or more threads to be scheduled in parallel.
-
公开(公告)号:US10067768B2
公开(公告)日:2018-09-04
申请号:US14798265
申请日:2015-07-13
Applicant: NVIDIA Corporation
Inventor: Gregory Frederick Diamos , Richard Craig Johnson , Vinod Grover , Olivier Giroux , Jack H. Choquette , Michael Alan Fetterman , Ajay S. Tirumala , Peter Nelson , Ronny Meir Krashinsky
Abstract: A method, system, and computer program product for executing divergent threads using a convergence barrier are disclosed. A first instruction in a program is executed by a plurality of threads, where the first instruction, when executed by a particular thread, indicates to a scheduler unit that the thread participates in a convergence barrier. A first path through the program is executed by a first divergent portion of the participating threads and a second path through the program is executed by a second divergent portion of the participating threads. The first divergent portion of the participating threads executes a second instruction in the program and transitions to a blocked state at the convergence barrier. The scheduler unit determines that all of the participating threads are synchronized at the convergence barrier and the convergence barrier is cleared.
-
16.
公开(公告)号:US20160019066A1
公开(公告)日:2016-01-21
申请号:US14798265
申请日:2015-07-13
Applicant: NVIDIA CORPORATION
Inventor: Gregory Frederick Diamos , Richard Craig Johnson , Vinod Grover , Olivier Giroux , Jack H. Choquette , Michael Alan Fetterman , Ajay S. Tirumala , Peter Nelson , Ronny Meir Krashinsky
CPC classification number: G06F9/522 , G06F9/30087 , G06F9/3009 , G06F9/3851 , G06F9/3887
Abstract: A method, system, and computer program product for executing divergent threads using a convergence barrier are disclosed. A first instruction in a program is executed by a plurality of threads, where the first instruction, when executed by a particular thread, indicates to a scheduler unit that the thread participates in a convergence barrier. A first path through the program is executed by a first divergent portion of the participating threads and a second path through the program is executed by a second divergent portion of the participating threads. The first divergent portion of the participating threads executes a second instruction in the program and transitions to a blocked state at the convergence barrier. The scheduler unit determines that all of the participating threads are synchronized at the convergence barrier and the convergence barrier is cleared.
Abstract translation: 公开了一种使用会聚障碍来执行发散线程的方法,系统和计算机程序产品。 程序中的第一指令由多个线程执行,其中当特定线程执行时,第一指令向调度器单元指示线程参与会聚障碍。 通过程序的第一路径由参与线程的第一发散部分执行,并且通过程序的第二路径由参与线程的第二发散部分执行。 参与线程的第一发散部分执行程序中的第二条指令,并在会聚障碍处转变为阻塞状态。 调度器单元确定所有参与线程在会聚障碍处被同步,并且会聚障碍被清除。
-
公开(公告)号:US20240176516A1
公开(公告)日:2024-05-30
申请号:US18081550
申请日:2022-12-14
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Olivier Giroux , Jack H. Choquette , Gokul Ramaswamy Hirisave Chandra Shekhara , Rui Guo , Chao Li , Vishalkumar Ketankumar Mehta , David Dastous St. Hilaire , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Subhasmita Chakraborty , Vikram Dhar
IPC: G06F3/06
CPC classification number: G06F3/0625 , G06F3/0659 , G06F3/0673
Abstract: Apparatuses, systems, and techniques to check memory transaction information. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to check for information provided in a token by one or more users about one or more memory transactions after a first amount of time indicated by one or more users.
-
18.
公开(公告)号:US20240169470A1
公开(公告)日:2024-05-23
申请号:US18086442
申请日:2022-12-21
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Vishalkumar Ketankumar Mehta , Aditya Avinash Atluri , Apoorv Parle , Chao Li , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette
Abstract: Apparatuses, systems, and techniques to store information in a plurality of storage locations allocated to a graphics processing unit (GPU). In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause information to be stored in a plurality of storage locations allocated to a first GPU.
-
公开(公告)号:US20240169023A1
公开(公告)日:2024-05-23
申请号:US18072060
申请日:2022-11-30
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Kyrylo Perelygin , Maciej Tyrlik , Gokul Ramaswamy Hirisave Chandra Shekhara , Balaji Krishna Yugandhar Atukuri , Rishkul Kulkarni , Konstantinos Kyriakopoulos , Edward H. Gornish , David Allan Berson , Bageshri Sathe , James Player , Aman Arora , Alan Kaatz , Andrew Kerr , Haicheng Wu , Cris Cecka , Vijay Thakkar , Sean Treichler , Jack H. Choquette , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Cody Addison , Girish Bhaskarrao Bharambe
IPC: G06F17/16
CPC classification number: G06F17/16
Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to indicate whether matrix multiply-accumulate (MMA) memory operations are complete.
-
公开(公告)号:US20240168830A1
公开(公告)日:2024-05-23
申请号:US18086461
申请日:2022-12-21
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette
IPC: G06F9/54
CPC classification number: G06F9/544
Abstract: Apparatuses, systems, and techniques to indicate storage locations of information to be mapped from a first tensor to a second tensor. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to indicate one or more storage locations of information to be mapped from a first tensor to a second tensor.
-
-
-
-
-
-
-
-
-