Patent search ap:("NVIDIA Corporation") AND inv:"Rishkul Kulkarni" Page 1

1.

发明公开
APPLICATION PROGRAMMING INTERFACE TO SYNCHRONIZE MATRIX MULTIPLY-ACCUMULATE MEMORY TRANSACTIONS 审中-公开

公开(公告)号：US20240169022A1

公开(公告)日：2024-05-23

申请号：US18072053

申请日：2022-11-30

Applicant: NVIDIA Corporation

Inventor： Harold Carter Edwards , Kyrylo Perelygin , Maciej Tyrlik , Gokul Ramaswamy Hirisave Chandra Shekhara , Balaji Krishna Yugandhar Atukuri , Rishkul Kulkarni , Konstantinos Kyriakopoulos , Edward H. Gornish , David Allan Berson , Bageshri Sathe , James Player , Aman Arora , Alan Kaatz , Andrew Kerr , Haicheng Wu , Cris Cecka , Vijay Thakkar , Sean Treichler , Jack H. Choquette , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Cody Addison , Girish Bhaskarrao Bharambe

IPC: G06F17/16 , G06F9/30

CPC classification number: G06F17/16 , G06F9/3001 , G06F9/3009

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until matrix multiply-accumulate (MMA) memory transactions are performed.

2.

发明授权
Techniques for efficiently synchronizing multiple program threads 有权

公开(公告)号：US12271765B2

公开(公告)日：2025-04-08

申请号：US17338377

申请日：2021-06-03

Applicant: NVIDIA CORPORATION

Inventor： Ajay Sudarshan Tirumala , Olivier Giroux , Peter Nelson , Gary M. Tarolli , Ankita Upreti , Konstantinos Kyriakopoulos , Divya Shanmughan , Rishkul Kulkarni

IPC: G06F9/52 , G06F9/30 , G06F9/38 , G06F9/48 , G06F9/50

Abstract: Various embodiments include a parallel processing computer system that enables parallel instances of a program to synchronize at disparate addresses in memory. When the parallel program instances need to exchange data, the program instances synchronize based on a mask that identifies the program instances that are synchronizing. As each program instance reaches the point of synchronization, the program instance blocks and waits for all other program instances to reach the point of synchronization. When all program instances have reached the point of synchronization, at least one program instance executes a synchronous operation to exchange data. The program instances then continue execution at respective and disparate return addresses.

3.

发明公开
APPLICATION PROGRAMMING INTERFACE TO INDICATE MATRIX MULTIPLY-ACCUMULATE 审中-公开

公开(公告)号：US20240169023A1

公开(公告)日：2024-05-23

申请号：US18072060

申请日：2022-11-30

Applicant: NVIDIA Corporation

Inventor： Harold Carter Edwards , Kyrylo Perelygin , Maciej Tyrlik , Gokul Ramaswamy Hirisave Chandra Shekhara , Balaji Krishna Yugandhar Atukuri , Rishkul Kulkarni , Konstantinos Kyriakopoulos , Edward H. Gornish , David Allan Berson , Bageshri Sathe , James Player , Aman Arora , Alan Kaatz , Andrew Kerr , Haicheng Wu , Cris Cecka , Vijay Thakkar , Sean Treichler , Jack H. Choquette , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Cody Addison , Girish Bhaskarrao Bharambe

IPC: G06F17/16

CPC classification number: G06F17/16

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to indicate whether matrix multiply-accumulate (MMA) memory operations are complete.

4.

发明授权
Application programming interface to wait on matrix multiply-accumulate 有权

公开(公告)号：US12204897B2

公开(公告)日：2025-01-21

申请号：US18072081

申请日：2022-11-30

Applicant: NVIDIA Corporation

Inventor： Harold Carter Edwards , Kyrylo Perelygin , Maciej Tyrlik , Gokul Ramaswamy Hirisave Chandra Shekhara , Balaji Krishna Yugandhar Atukuri , Rishkul Kulkarni , Konstantinos Kyriakopoulos , Edward H. Gornish , David Allan Berson , Bageshri Sathe , James Player , Aman Arora , Alan Kaatz , Andrew Kerr , Haicheng Wu , Cris Cecka , Vijay Thakkar , Sean Treichler , Jack H. Choquette , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Cody Addison , Girish Bhaskarrao Bharambe

IPC: G06F9/30 , G06F9/38 , G06F17/16

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.

5.

发明公开
APPLICATION PROGRAMMING INTERFACE TO INDICATE OPERATIONS TO BE PERFORMED BY CORRESPONDING STREAMING MULTIPROCESSORS 审中-公开

公开(公告)号：US20240168763A1

公开(公告)日：2024-05-23

申请号：US18072300

申请日：2022-11-30

Applicant: NVIDIA Corporation

Inventor： Harold Carter Edwards , Kyrylo Perelygin , Maciej Tyrlik , Gokul Ramaswamy Hirisave Chandra Shekhara , Balaji Krishna Yugandhar Atukuri , Rishkul Kulkarni , Konstantinos Kyriakopoulos , Edward H. Gornish , David Allan Berson , Bageshri Sathe , James Player , Aman Arora , Alan Kaatz , Andrew Kerr , Haicheng Wu , Cris Cecka , Vijay Thakkar , Sean Treichler , Jack H. Choquette , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Cody Addison , Girish Bhaskarrao Bharambe

IPC: G06F9/30 , G06F17/16

CPC classification number: G06F9/3001 , G06F17/16

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause two or more other computational operations to be performed by two or more streaming multiprocessors (SMs).

6.

发明公开
APPLICATION PROGRAMMING INTERFACE TO WAIT ON MATRIX MULTIPLY-ACCUMULATE 审中-公开

公开(公告)号：US20240168762A1

公开(公告)日：2024-05-23

申请号：US18072081

申请日：2022-11-30

Applicant: NVIDIA Corporation

Inventor： Harold Carter Edwards , Kyrylo Perelygin , Maciej Tyrlik , Gokul Ramaswamy Hirisave Chandra Shekhara , Balaji Krishna Yugandhar Atukuri , Rishkul Kulkarni , Konstantinos Kyriakopoulos , Edward H. Gornish , David Allan Berson , Bageshri Sathe , James Player , Aman Arora , Alan Kaatz , Andrew Kerr , Haicheng Wu , Cris Cecka , Vijay Thakkar , Sean Treichler , Jack H. Choquette , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Cody Addison , Girish Bhaskarrao Bharambe

IPC: G06F9/30 , G06F17/16

CPC classification number: G06F9/3001 , G06F9/3009 , G06F17/16

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.

7.

发明公开
SCALARIZATION OF INSTRUCTIONS FOR SIMT ARCHITECTURES 审中-公开

公开(公告)号：US20240118899A1

公开(公告)日：2024-04-11

申请号：US18105679

申请日：2023-02-03

Applicant: NVIDIA Corporation

Inventor： Aditya Avinash Atluri , Jack Choquette , Carter Edwards , Olivier Giroux , Praveen Kumar Kaushik , Ronny Krashinsky , Rishkul Kulkarni , Konstantinos Kyriakopoulos

IPC: G06F9/38

CPC classification number: G06F9/3851

Abstract: Apparatuses, systems, and techniques to adapt instructions in a SIMT architecture for execution on serial execution units. In at least one embodiment, a set of one or more threads is selected from a group of active threads associated with an instruction and the instruction is executed for the set of one or more threads on a serial execution unit.

8.

发明公开
TECHNIQUES TO SELECTIVELY STORE DATA 审中-公开

公开(公告)号：US20230305845A1

公开(公告)日：2023-09-28

申请号：US17710699

申请日：2022-03-31

Applicant: NVIDIA Corporation

Inventor： Harold Carter Edwards , Stephen Anthony Bernard Jones , David Anthony Fontaine , Sebastian Piotr Jodlowski , Aditya Avinash Atluri , Andrew Robert Kerr , Michael Andrew Clark , Gonzalo Brito Gadeschi , Olivier Giroux , Jaydeep Marathe , Thibaut Lutz , Hariharan Sandanagobalane , Gokul Ramaswamy Hirisave Chandra Shekhara , Girish Bhaskarrao Bharambe , Rishkul Kulkarni , Konstantinos Kyriakopoulos

IPC: G06F9/30 , G06F9/54 , G06F9/50

CPC classification number: G06F9/3009 , G06F9/30043 , G06F9/544 , G06F9/5016

Abstract: Apparatuses, systems, and techniques to cause data to be selectively stored in one or more memory locations. In at least one embodiment, a processor is to cause data to be selectively stored in one or more memory locations based, at least in part, on one or more threads to use the data.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification