-
公开(公告)号:US12204897B2
公开(公告)日:2025-01-21
申请号:US18072081
申请日:2022-11-30
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Kyrylo Perelygin , Maciej Tyrlik , Gokul Ramaswamy Hirisave Chandra Shekhara , Balaji Krishna Yugandhar Atukuri , Rishkul Kulkarni , Konstantinos Kyriakopoulos , Edward H. Gornish , David Allan Berson , Bageshri Sathe , James Player , Aman Arora , Alan Kaatz , Andrew Kerr , Haicheng Wu , Cris Cecka , Vijay Thakkar , Sean Treichler , Jack H. Choquette , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Cody Addison , Girish Bhaskarrao Bharambe
Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.
-
公开(公告)号:US20240169471A1
公开(公告)日:2024-05-23
申请号:US18086476
申请日:2022-12-21
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette
IPC: G06T1/60 , G06F12/0811 , G06F12/0862 , G06T1/20
CPC classification number: G06T1/60 , G06F12/0811 , G06F12/0862 , G06T1/20 , G06F2212/62
Abstract: Apparatuses, systems, and techniques to perform a graphics processing unit (GPU) prefetch instruction to cause a variable amount of information to be stored into one or more GPU caches. In at least one embodiment, one or more circuits of a GPU are to perform a GPU prefetch instruction to cause a variable amount of information to be stored into one or more GPU caches.
-
33.
公开(公告)号:US20240168795A1
公开(公告)日:2024-05-23
申请号:US18081552
申请日:2022-12-14
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Olivier Giroux , Jack H. Choquette , Gokul Ramaswamy Hirisave Chandra Shekhara , Rui Guo , Chao Li , Vishalkumar Ketankumar Mehta , David Dastous St. Hilaire , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Subhasmita Chakraborty , Vikram Dhar
CPC classification number: G06F9/467 , G06F9/3004 , G06F9/3877 , G06F9/541
Abstract: Apparatuses, systems, and techniques to perform delayed memory transaction information check. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to check for information provided by one or more users about one or more memory transactions after a timeout event indicated by one or more users.
-
34.
公开(公告)号:US20240168763A1
公开(公告)日:2024-05-23
申请号:US18072300
申请日:2022-11-30
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Kyrylo Perelygin , Maciej Tyrlik , Gokul Ramaswamy Hirisave Chandra Shekhara , Balaji Krishna Yugandhar Atukuri , Rishkul Kulkarni , Konstantinos Kyriakopoulos , Edward H. Gornish , David Allan Berson , Bageshri Sathe , James Player , Aman Arora , Alan Kaatz , Andrew Kerr , Haicheng Wu , Cris Cecka , Vijay Thakkar , Sean Treichler , Jack H. Choquette , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Cody Addison , Girish Bhaskarrao Bharambe
CPC classification number: G06F9/3001 , G06F17/16
Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause two or more other computational operations to be performed by two or more streaming multiprocessors (SMs).
-
公开(公告)号:US20240168762A1
公开(公告)日:2024-05-23
申请号:US18072081
申请日:2022-11-30
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Kyrylo Perelygin , Maciej Tyrlik , Gokul Ramaswamy Hirisave Chandra Shekhara , Balaji Krishna Yugandhar Atukuri , Rishkul Kulkarni , Konstantinos Kyriakopoulos , Edward H. Gornish , David Allan Berson , Bageshri Sathe , James Player , Aman Arora , Alan Kaatz , Andrew Kerr , Haicheng Wu , Cris Cecka , Vijay Thakkar , Sean Treichler , Jack H. Choquette , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Cody Addison , Girish Bhaskarrao Bharambe
CPC classification number: G06F9/3001 , G06F9/3009 , G06F17/16
Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.
-
36.
公开(公告)号:US20240168659A1
公开(公告)日:2024-05-23
申请号:US18086429
申请日:2022-12-21
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Aditya Avinash Atluri , Apoorv Parle , Chao Li , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette
IPC: G06F3/06 , G06F12/0862
CPC classification number: G06F3/0625 , G06F3/0646 , G06F3/0659 , G06F3/0673 , G06F12/0862 , G06F2212/608
Abstract: Apparatuses, systems, and techniques to transform and store information corresponding to one or more memory transactions. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause information corresponding to one or more memory transactions resulting from performance of the API to be transformed and stored.
-
公开(公告)号:US20240036952A1
公开(公告)日:2024-02-01
申请号:US17955052
申请日:2022-09-28
Applicant: NVIDIA Corporation
Inventor: Ze Long , Kyrylo Perelygin , Harold Carter Edwards , Gokul Ramaswamy Hirisave Chandra Shekhara , Jaydeep Marathe , Ronny Meir Krashinsky , Girish Bhaskarrao Bharambe
CPC classification number: G06F9/544 , G06F9/4881
Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to determine which of two or more blocks of threads are to be scheduled in parallel.
-
公开(公告)号:US20240036951A1
公开(公告)日:2024-02-01
申请号:US17955023
申请日:2022-09-28
Applicant: NVIDIA Corporation
Inventor: Ze Long , Kyrylo Perelygin , Harold Carter Edwards , Gokul Ramaswamy Hirisave Chandra Shekhara , Jaydeep Marathe , Ronny Meir Krashinsky , Girish Bhaskarrao Bharambe
CPC classification number: G06F9/544 , G06F9/4881
Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate two or more blocks of threads to be scheduled in parallel.
-
公开(公告)号:US20230305845A1
公开(公告)日:2023-09-28
申请号:US17710699
申请日:2022-03-31
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , David Anthony Fontaine , Sebastian Piotr Jodlowski , Aditya Avinash Atluri , Andrew Robert Kerr , Michael Andrew Clark , Gonzalo Brito Gadeschi , Olivier Giroux , Jaydeep Marathe , Thibaut Lutz , Hariharan Sandanagobalane , Gokul Ramaswamy Hirisave Chandra Shekhara , Girish Bhaskarrao Bharambe , Rishkul Kulkarni , Konstantinos Kyriakopoulos
CPC classification number: G06F9/3009 , G06F9/30043 , G06F9/544 , G06F9/5016
Abstract: Apparatuses, systems, and techniques to cause data to be selectively stored in one or more memory locations. In at least one embodiment, a processor is to cause data to be selectively stored in one or more memory locations based, at least in part, on one or more threads to use the data.
-
公开(公告)号:US20210294673A1
公开(公告)日:2021-09-23
申请号:US16824457
申请日:2020-03-19
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards
Abstract: Apparatuses, systems, and techniques to execute data-dependent parallel operations in one or more programs utilizing an application programming interface to perform parallel computing, such as CUDA, without relying on a synchronization operation between said one or more programs. For example, at least one embodiment pertains to processors or computing systems used to determine which thread in a group of threads finishes modifying shared data last, and that thread is selected to perform additional data-dependent computations from said group of threads.
-
-
-
-
-
-
-
-
-