Patent search ap:("NVIDIA Corporation") AND inv:"Haicheng Wu" Page 1

1.

发明公开
APPLICATION PROGRAMMING INTERFACE TO INDICATE MATRIX MULTIPLY-ACCUMULATE 审中-公开

公开(公告)号：US20240169023A1

公开(公告)日：2024-05-23

申请号：US18072060

申请日：2022-11-30

Applicant: NVIDIA Corporation

Inventor： Harold Carter Edwards , Kyrylo Perelygin , Maciej Tyrlik , Gokul Ramaswamy Hirisave Chandra Shekhara , Balaji Krishna Yugandhar Atukuri , Rishkul Kulkarni , Konstantinos Kyriakopoulos , Edward H. Gornish , David Allan Berson , Bageshri Sathe , James Player , Aman Arora , Alan Kaatz , Andrew Kerr , Haicheng Wu , Cris Cecka , Vijay Thakkar , Sean Treichler , Jack H. Choquette , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Cody Addison , Girish Bhaskarrao Bharambe

IPC: G06F17/16

CPC classification number: G06F17/16

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to indicate whether matrix multiply-accumulate (MMA) memory operations are complete.

2.

发明公开
APPLICATION PROGRAMMING INTERFACE TO SYNCHRONIZE MATRIX MULTIPLY-ACCUMULATE MEMORY TRANSACTIONS 审中-公开

公开(公告)号：US20240169022A1

公开(公告)日：2024-05-23

申请号：US18072053

申请日：2022-11-30

Applicant: NVIDIA Corporation

Inventor： Harold Carter Edwards , Kyrylo Perelygin , Maciej Tyrlik , Gokul Ramaswamy Hirisave Chandra Shekhara , Balaji Krishna Yugandhar Atukuri , Rishkul Kulkarni , Konstantinos Kyriakopoulos , Edward H. Gornish , David Allan Berson , Bageshri Sathe , James Player , Aman Arora , Alan Kaatz , Andrew Kerr , Haicheng Wu , Cris Cecka , Vijay Thakkar , Sean Treichler , Jack H. Choquette , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Cody Addison , Girish Bhaskarrao Bharambe

IPC: G06F17/16 , G06F9/30

CPC classification number: G06F17/16 , G06F9/3001 , G06F9/3009

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until matrix multiply-accumulate (MMA) memory transactions are performed.

3.

发明授权
Application programming interface to wait on matrix multiply-accumulate 有权

公开(公告)号：US12204897B2

公开(公告)日：2025-01-21

申请号：US18072081

申请日：2022-11-30

Applicant: NVIDIA Corporation

Inventor： Harold Carter Edwards , Kyrylo Perelygin , Maciej Tyrlik , Gokul Ramaswamy Hirisave Chandra Shekhara , Balaji Krishna Yugandhar Atukuri , Rishkul Kulkarni , Konstantinos Kyriakopoulos , Edward H. Gornish , David Allan Berson , Bageshri Sathe , James Player , Aman Arora , Alan Kaatz , Andrew Kerr , Haicheng Wu , Cris Cecka , Vijay Thakkar , Sean Treichler , Jack H. Choquette , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Cody Addison , Girish Bhaskarrao Bharambe

IPC: G06F9/30 , G06F9/38 , G06F17/16

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.

4.

发明公开
APPLICATION PROGRAMMING INTERFACE TO INDICATE OPERATIONS TO BE PERFORMED BY CORRESPONDING STREAMING MULTIPROCESSORS 审中-公开

公开(公告)号：US20240168763A1

公开(公告)日：2024-05-23

申请号：US18072300

申请日：2022-11-30

Applicant: NVIDIA Corporation

Inventor： Harold Carter Edwards , Kyrylo Perelygin , Maciej Tyrlik , Gokul Ramaswamy Hirisave Chandra Shekhara , Balaji Krishna Yugandhar Atukuri , Rishkul Kulkarni , Konstantinos Kyriakopoulos , Edward H. Gornish , David Allan Berson , Bageshri Sathe , James Player , Aman Arora , Alan Kaatz , Andrew Kerr , Haicheng Wu , Cris Cecka , Vijay Thakkar , Sean Treichler , Jack H. Choquette , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Cody Addison , Girish Bhaskarrao Bharambe

IPC: G06F9/30 , G06F17/16

CPC classification number: G06F9/3001 , G06F17/16

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause two or more other computational operations to be performed by two or more streaming multiprocessors (SMs).

5.

发明公开
APPLICATION PROGRAMMING INTERFACE TO WAIT ON MATRIX MULTIPLY-ACCUMULATE 审中-公开

公开(公告)号：US20240168762A1

公开(公告)日：2024-05-23

申请号：US18072081

申请日：2022-11-30

Applicant: NVIDIA Corporation

Inventor： Harold Carter Edwards , Kyrylo Perelygin , Maciej Tyrlik , Gokul Ramaswamy Hirisave Chandra Shekhara , Balaji Krishna Yugandhar Atukuri , Rishkul Kulkarni , Konstantinos Kyriakopoulos , Edward H. Gornish , David Allan Berson , Bageshri Sathe , James Player , Aman Arora , Alan Kaatz , Andrew Kerr , Haicheng Wu , Cris Cecka , Vijay Thakkar , Sean Treichler , Jack H. Choquette , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Cody Addison , Girish Bhaskarrao Bharambe

IPC: G06F9/30 , G06F17/16

CPC classification number: G06F9/3001 , G06F9/3009 , G06F17/16

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.

6.

发明公开
NON-RECTANGULAR MATRIX COMPUTATIONS AND DATA PATTERN PROCESSING USING TENSOR CORES 审中-公开

公开(公告)号：US20230297643A1

公开(公告)日：2023-09-21

申请号：US17700239

申请日：2022-03-21

Applicant: NVIDIA Corporation

Inventor： Aniket Shivam , Andrew Kerr , Haicheng Wu , Manish Gupta , Nikita Shustrov , Qing Yang , Alan Kaatz , Aditya Avinash Atluri

IPC: G06F17/16 , G06F7/483

CPC classification number: G06F17/16 , G06F7/483

Abstract: Matrix multiplication operations can be implemented, at least in part, on one or more tensor cores of a parallel processing unit. An efficiency of the matrix multiplication operations can be improved in cases where one of the input operands or the output operand of the matrix multiplication operation is a square matrix having a triangular data pattern. In such cases, the number of computations performed by the tensor cores of the parallel processing unit can be reduced by dropping computations and/or masking out elements of the square matrix input operand on one side of the main diagonal of the square matrix. In other cases where the output operand exhibits the triangular data pattern, computations can be dropped or masked out for the invalid side of the main diagonal of the square matrix. In an embodiment, a library implementing the matrix multiplication operations is provided.

Patent Agency Ranking