Patent search ap:("NVIDIA Corporation") AND inv:"Alan Kaatz" Page 1

1.

发明公开
APPLICATION PROGRAMMING INTERFACE TO STORE INFORMATION IN A PLURALITY OF STORAGE LOCATIONS 审中-公开

公开(公告)号：US20240169470A1

公开(公告)日：2024-05-23

申请号：US18086442

申请日：2022-12-21

Applicant: NVIDIA Corporation

Inventor： Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Vishalkumar Ketankumar Mehta , Aditya Avinash Atluri , Apoorv Parle , Chao Li , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette

IPC: G06T1/60 , G06F9/54 , G06T1/20

CPC classification number: G06T1/60 , G06F9/541 , G06T1/20

Abstract: Apparatuses, systems, and techniques to store information in a plurality of storage locations allocated to a graphics processing unit (GPU). In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause information to be stored in a plurality of storage locations allocated to a first GPU.

2.

发明公开
APPLICATION PROGRAMMING INTERFACE TO INDICATE MATRIX MULTIPLY-ACCUMULATE 审中-公开

公开(公告)号：US20240169023A1

公开(公告)日：2024-05-23

申请号：US18072060

申请日：2022-11-30

Applicant: NVIDIA Corporation

Inventor： Harold Carter Edwards , Kyrylo Perelygin , Maciej Tyrlik , Gokul Ramaswamy Hirisave Chandra Shekhara , Balaji Krishna Yugandhar Atukuri , Rishkul Kulkarni , Konstantinos Kyriakopoulos , Edward H. Gornish , David Allan Berson , Bageshri Sathe , James Player , Aman Arora , Alan Kaatz , Andrew Kerr , Haicheng Wu , Cris Cecka , Vijay Thakkar , Sean Treichler , Jack H. Choquette , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Cody Addison , Girish Bhaskarrao Bharambe

IPC: G06F17/16

CPC classification number: G06F17/16

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to indicate whether matrix multiply-accumulate (MMA) memory operations are complete.

3.

发明公开
APPLICATION PROGRAMMING INTERFACE TO INDICATE STORAGE LOCATIONS 审中-公开

公开(公告)号：US20240168830A1

公开(公告)日：2024-05-23

申请号：US18086461

申请日：2022-12-21

Applicant: NVIDIA Corporation

Inventor： Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette

IPC: G06F9/54

CPC classification number: G06F9/544

Abstract: Apparatuses, systems, and techniques to indicate storage locations of information to be mapped from a first tensor to a second tensor. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to indicate one or more storage locations of information to be mapped from a first tensor to a second tensor.

4.

发明公开
TECHNIQUES FOR EFFICIENTLY TRANSFERRING DATA TO A PROCESSOR 审中-公开

公开(公告)号：US20230185570A1

公开(公告)日：2023-06-15

申请号：US18107374

申请日：2023-02-08

Applicant: NVIDIA Corporation

Inventor： Andrew KERR , Jack Choquette , Xiaogang Qiu , Omkar Paranjape , Poornachandra Rao , Shirish Gadre , Steven J. Heinrich , Manan Patel , Olivier Giroux , Alan Kaatz

IPC: G06F9/30 , G06F12/0808 , G06F12/0888 , G06F9/32 , G06F9/38 , G06F9/52 , G06F9/54

CPC classification number: G06F9/30043 , G06F12/0808 , G06F12/0888 , G06F9/3009 , G06F9/321 , G06F9/3871 , G06F9/522 , G06F9/542 , G06F9/544 , G06F9/546 , G06F9/3838 , G06F2212/621 , G06F9/3004

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

5.

发明公开
STORAGE OF TENSOR IN A CACHE 审中-公开

公开(公告)号：US20240168765A1

公开(公告)日：2024-05-23

申请号：US18086478

申请日：2022-12-21

Applicant: NVIDIA Corporation

Inventor： Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette

IPC: G06F9/38 , G06F9/30

CPC classification number: G06F9/3802 , G06F9/30047

Abstract: Apparatuses, systems, and techniques to perform a tensor prefetch instruction to cause one or more tensors to be stored into one or more caches. In at least one embodiment, one or more circuits of a GPU are to perform a tensor prefetch instruction to cause one or more tensors to be stored into one or more GPU caches.

6.

发明授权
Techniques for efficiently transferring data to a processor 有权

公开(公告)号：US11907717B2

公开(公告)日：2024-02-20

申请号：US18107374

申请日：2023-02-08

Applicant: NVIDIA Corporation

Inventor： Andrew Kerr , Jack Choquette , Xiaogang Qiu , Omkar Paranjape , Poornachandra Rao , Shirish Gadre , Steven J. Heinrich , Manan Patel , Olivier Giroux , Alan Kaatz

IPC: G06F9/30 , G06F9/52 , G06F12/0808 , G06F12/0888

CPC classification number: G06F9/30043 , G06F9/3009 , G06F9/522 , G06F12/0808 , G06F12/0888 , G06F9/3004

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

7.

发明公开
METHOD AND APPARATUS FOR EFFICIENT ACCESS TO MULTIDIMENSIONAL DATA STRUCTURES AND/OR OTHER LARGE DATA BLOCKS 审中-公开

公开(公告)号：US20230289304A1

公开(公告)日：2023-09-14

申请号：US17691276

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Alexander L. Minkin , Alan Kaatz , Oliver Giroux , Jack Choquette , Shirish Gadre , Manan Patel , John Tran , Ronny Krashinsky , Jeff Schottmiller

IPC: G06F13/16

CPC classification number: G06F13/1663

Abstract: A parallel processing unit comprises a plurality of processors each being coupled to a memory access hardware circuitry. Each memory access hardware circuitry is configured to receive, from the coupled processor, a memory access request specifying a coordinate of a multidimensional data structure, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to a respective one of the processors; and, in response to the memory access request, translate the coordinate of the multidimensional data structure into plural memory addresses for the multidimensional data structure and using the plural memory addresses, asynchronously transfer at least a portion of the multidimensional data structure for processing by at least the coupled processor. The memory locations may be in the shared memory of the coupled processor and/or an external memory.

8.

发明授权
Techniques for efficiently transferring data to a processor 有权

公开(公告)号：US11604649B2

公开(公告)日：2023-03-14

申请号：US17363561

申请日：2021-06-30

Applicant: NVIDIA Corporation

Inventor： Andrew Kerr , Jack Choquette , Xiaogang Qiu , Omkar Paranjape , Poornachandra Rao , Shirish Gadre , Steven J. Heinrich , Manan Patel , Olivier Giroux , Alan Kaatz

IPC: G06F9/30 , G06F12/0808 , G06F12/0888 , G06F9/32 , G06F9/38 , G06F9/52 , G06F9/54

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

9.

发明授权
Application programming interface to wait on matrix multiply-accumulate 有权

公开(公告)号：US12204897B2

公开(公告)日：2025-01-21

申请号：US18072081

申请日：2022-11-30

Applicant: NVIDIA Corporation

Inventor： Harold Carter Edwards , Kyrylo Perelygin , Maciej Tyrlik , Gokul Ramaswamy Hirisave Chandra Shekhara , Balaji Krishna Yugandhar Atukuri , Rishkul Kulkarni , Konstantinos Kyriakopoulos , Edward H. Gornish , David Allan Berson , Bageshri Sathe , James Player , Aman Arora , Alan Kaatz , Andrew Kerr , Haicheng Wu , Cris Cecka , Vijay Thakkar , Sean Treichler , Jack H. Choquette , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Cody Addison , Girish Bhaskarrao Bharambe

IPC: G06F9/30 , G06F9/38 , G06F17/16

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.

10.

发明公开
STORAGE OF INFORMATION IN A GRAPHICS PROCESSING UNIT CACHE 审中-公开

公开(公告)号：US20240169471A1

公开(公告)日：2024-05-23

申请号：US18086476

申请日：2022-12-21

Applicant: NVIDIA Corporation

Inventor： Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette

IPC: G06T1/60 , G06F12/0811 , G06F12/0862 , G06T1/20

CPC classification number: G06T1/60 , G06F12/0811 , G06F12/0862 , G06T1/20 , G06F2212/62

Abstract: Apparatuses, systems, and techniques to perform a graphics processing unit (GPU) prefetch instruction to cause a variable amount of information to be stored into one or more GPU caches. In at least one embodiment, one or more circuits of a GPU are to perform a GPU prefetch instruction to cause a variable amount of information to be stored into one or more GPU caches.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification