Patent search ap:("NVIDIA CORPORATION") AND inv:"Jack Choquette" Page 1

1.

发明授权
Method and apparatus for efficient access to multidimensional data structures and/or other large data blocks 有权

公开(公告)号：US12141082B2

公开(公告)日：2024-11-12

申请号：US17691276

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Alexander L. Minkin , Alan Kaatz , Oliver Giroux , Jack Choquette , Shirish Gadre , Manan Patel , John Tran , Ronny Krashinsky , Jeff Schottmiller

IPC: G06F13/16

Abstract: A parallel processing unit comprises a plurality of processors each being coupled to a memory access hardware circuitry. Each memory access hardware circuitry is configured to receive, from the coupled processor, a memory access request specifying a coordinate of a multidimensional data structure, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to a respective one of the processors; and, in response to the memory access request, translate the coordinate of the multidimensional data structure into plural memory addresses for the multidimensional data structure and using the plural memory addresses, asynchronously transfer at least a portion of the multidimensional data structure for processing by at least the coupled processor. The memory locations may be in the shared memory of the coupled processor and/or an external memory.

2.

发明公开
METHOD AND APPARATUS FOR EFFICIENT ACCESS TO MULTIDIMENSIONAL DATA STRUCTURES AND/OR OTHER LARGE DATA BLOCKS 审中-公开

公开(公告)号：US20230289292A1

公开(公告)日：2023-09-14

申请号：US17691422

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Alexander L. Minkin , Alan Kaatz , Olivier Giroux , Jack Choquette , Shirish Gadre , Manan Patel , John Tran , Ronny Krashinsky , Jeff Schottmiller

IPC: G06F12/0875

CPC classification number: G06F12/0875 , G06F2212/62 , G06F2212/452

Abstract: A parallel processing unit comprises a plurality of processors each being coupled to a memory access hardware circuitry. Each memory access hardware circuitry is configured to receive, from the coupled processor, a memory access request specifying a coordinate of a multidimensional data structure, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to a respective one of the processors; and, in response to the memory access request, translate the coordinate of the multidimensional data structure into plural memory addresses for the multidimensional data structure and using the plural memory addresses, asynchronously transfer at least a portion of the multidimensional data structure for processing by at least the coupled processor. The memory locations may be in the shared memory of the coupled processor and/or an external memory.

3.

发明授权
Techniques for efficiently transferring data to a processor 有权

公开(公告)号：US11080051B2

公开(公告)日：2021-08-03

申请号：US16712083

申请日：2019-12-12

Applicant: NVIDIA Corporation

Inventor： Andrew Kerr , Jack Choquette , Xiaogang Qiu , Omkar Paranjape , Poornachandra Rao , Shirish Gadre , Steven J. Heinrich , Manan Patel , Olivier Giroux , Alan Kaatz

IPC: G06F9/30 , G06F12/0808 , G06F12/0888

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

4.

发明申请
TECHNIQUES FOR COMPREHENSIVELY SYNCHRONIZING EXECUTION THREADS 审中-公开

公开(公告)号：US20180314520A1

公开(公告)日：2018-11-01

申请号：US15499843

申请日：2017-04-27

Applicant: NVIDIA Corporation

Inventor： Ajay Sudarshan Tirumala , Olivier Giroux , Peter Nelson , Jack Choquette

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/3009 , G06F9/30087 , G06F9/3851 , G06F9/46

Abstract: In one embodiment, a synchronization instruction causes a processor to ensure that specified threads included within a warp concurrently execute a single subsequent instruction. The specified threads include at least a first thread and a second thread. In operation, the first thread arrives at the synchronization instruction. The processor determines that the second thread has not yet arrived at the synchronization instruction and configures the first thread to stop executing instructions. After issuing at least one instruction for the second thread, the processor determines that all the specified threads have arrived at the synchronization instruction. The processor then causes all the specified threads to execute the subsequent instruction. Advantageously, unlike conventional approaches to synchronizing threads, the synchronization instruction enables the processor to reliably and properly execute code that includes complex control flows and/or instructions that presuppose that threads are converged.

5.

发明授权
Techniques for efficiently transferring data to a processor 有权

公开(公告)号：US11907717B2

公开(公告)日：2024-02-20

申请号：US18107374

申请日：2023-02-08

Applicant: NVIDIA Corporation

Inventor： Andrew Kerr , Jack Choquette , Xiaogang Qiu , Omkar Paranjape , Poornachandra Rao , Shirish Gadre , Steven J. Heinrich , Manan Patel , Olivier Giroux , Alan Kaatz

IPC: G06F9/30 , G06F9/52 , G06F12/0808 , G06F12/0888

CPC classification number: G06F9/30043 , G06F9/3009 , G06F9/522 , G06F12/0808 , G06F12/0888 , G06F9/3004

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

6.

发明公开
METHOD AND APPARATUS FOR EFFICIENT ACCESS TO MULTIDIMENSIONAL DATA STRUCTURES AND/OR OTHER LARGE DATA BLOCKS 审中-公开

公开(公告)号：US20230289304A1

公开(公告)日：2023-09-14

申请号：US17691276

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Alexander L. Minkin , Alan Kaatz , Oliver Giroux , Jack Choquette , Shirish Gadre , Manan Patel , John Tran , Ronny Krashinsky , Jeff Schottmiller

IPC: G06F13/16

CPC classification number: G06F13/1663

Abstract: A parallel processing unit comprises a plurality of processors each being coupled to a memory access hardware circuitry. Each memory access hardware circuitry is configured to receive, from the coupled processor, a memory access request specifying a coordinate of a multidimensional data structure, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to a respective one of the processors; and, in response to the memory access request, translate the coordinate of the multidimensional data structure into plural memory addresses for the multidimensional data structure and using the plural memory addresses, asynchronously transfer at least a portion of the multidimensional data structure for processing by at least the coupled processor. The memory locations may be in the shared memory of the coupled processor and/or an external memory.

7.

发明授权
Techniques for efficiently transferring data to a processor 有权

公开(公告)号：US11604649B2

公开(公告)日：2023-03-14

申请号：US17363561

申请日：2021-06-30

Applicant: NVIDIA Corporation

Inventor： Andrew Kerr , Jack Choquette , Xiaogang Qiu , Omkar Paranjape , Poornachandra Rao , Shirish Gadre , Steven J. Heinrich , Manan Patel , Olivier Giroux , Alan Kaatz

IPC: G06F9/30 , G06F12/0808 , G06F12/0888 , G06F9/32 , G06F9/38 , G06F9/52 , G06F9/54

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

8.

发明申请
Techniques for maintaining atomicity and ordering for pixel shader operations 审中-公开

公开(公告)号：US20180374185A1

公开(公告)日：2018-12-27

申请号：US15999185

申请日：2018-08-17

Applicant: NVIDIA Corporation

Inventor： Ziyad Hakura , Eric Lum , Dale Kirkland , Jack Choquette , Patrick R. Brown , Yury Y. Uralsky , Jeffrey Bolz

IPC: G06T1/20 , G06T11/40 , G06T1/60

CPC classification number: G06T1/20 , G06T1/60 , G06T11/40

Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.

9.

发明授权
Robust, efficient multiprocessor-coprocessor interface 有权

公开(公告)号：US11966737B2

公开(公告)日：2024-04-23

申请号：US17465234

申请日：2021-09-02

Applicant: NVIDIA Corporation

Inventor： Ronald Charles Babich, Jr. , John Burgess , Jack Choquette , Tero Karras , Samuli Laine , Ignacio Llamas , Gregory Muthler , William Parsons Newhall, Jr.

IPC: G06F9/30 , G06F9/38 , G06F9/48 , G06F15/163 , G06T1/20 , G06T1/60

CPC classification number: G06F9/3004 , G06F9/3877 , G06F9/4843 , G06F15/163 , G06T1/20 , G06T1/60 , G06T2200/28

Abstract: Systems and methods for an efficient and robust multiprocessor-coprocessor interface that may be used between a streaming multiprocessor and an acceleration coprocessor in a GPU are provided. According to an example implementation, in order to perform an acceleration of a particular operation using the coprocessor, the multiprocessor: issues a series of write instructions to write input data for the operation into coprocessor-accessible storage locations, issues an operation instruction to cause the coprocessor to execute the particular operation; and then issues a series of read instructions to read result data of the operation from coprocessor-accessible storage locations to multiprocessor-accessible storage locations.

10.

发明公开
TECHNIQUES FOR EFFICIENTLY TRANSFERRING DATA TO A PROCESSOR 审中-公开

公开(公告)号：US20230185570A1

公开(公告)日：2023-06-15

申请号：US18107374

申请日：2023-02-08

Applicant: NVIDIA Corporation

Inventor： Andrew KERR , Jack Choquette , Xiaogang Qiu , Omkar Paranjape , Poornachandra Rao , Shirish Gadre , Steven J. Heinrich , Manan Patel , Olivier Giroux , Alan Kaatz

IPC: G06F9/30 , G06F12/0808 , G06F12/0888 , G06F9/32 , G06F9/38 , G06F9/52 , G06F9/54

CPC classification number: G06F9/30043 , G06F12/0808 , G06F12/0888 , G06F9/3009 , G06F9/321 , G06F9/3871 , G06F9/522 , G06F9/542 , G06F9/544 , G06F9/546 , G06F9/3838 , G06F2212/621 , G06F9/3004

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification