Patent search ap:("NVIDIA CORPORATION") AND inv:"Jack Choquette" Page 2

11.

发明授权
Unified cache for diverse memory traffic 有权

公开(公告)号：US11347668B2

公开(公告)日：2022-05-31

申请号：US16921795

申请日：2020-07-06

Applicant: NVIDIA Corporation

Inventor： Xiaogang Qiu , Ronny Krashinsky , Steven Heinrich , Shirish Gadre , John Edmondson , Jack Choquette , Mark Gebhart , Ramesh Jandhyala , Poornachandra Rao , Omkar Paranjape , Michael Siu

IPC: G06F13/28 , G06F12/0891 , G06F12/0811 , G06F12/084 , G06F12/0895 , G06F12/122 , G11C7/10

Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.

12.

发明申请
TECHNIQUES FOR COMPREHENSIVELY SYNCHRONIZING EXECUTION THREADS 审中-公开

公开(公告)号：US20200034143A1

公开(公告)日：2020-01-30

申请号：US16595398

申请日：2019-10-07

Applicant: NVIDIA Corporation

Inventor： Ajay Sudarshan Tirumala , Olivier Giroux , Peter Nelson , Jack Choquette

IPC: G06F9/30 , G06F9/38 , G06F9/46

Abstract: In one embodiment, a synchronization instruction causes a processor to ensure that specified threads included within a warp concurrently execute a single subsequent instruction. The specified threads include at least a first thread and a second thread. In operation, the first thread arrives at the synchronization instruction. The processor determines that the second thread has not yet arrived at the synchronization instruction and configures the first thread to stop executing instructions. After issuing at least one instruction for the second thread, the processor determines that all the specified threads have arrived at the synchronization instruction. The processor then causes all the specified threads to execute the subsequent instruction. Advantageously, unlike conventional approaches to synchronizing threads, the synchronization instruction enables the processor to reliably and properly execute code that includes complex control flows and/or instructions that presuppose that threads are converged.

13.

发明授权
Unified cache for diverse memory traffic 有权

公开(公告)号：US10459861B2

公开(公告)日：2019-10-29

申请号：US15716461

申请日：2017-09-26

Applicant: NVIDIA Corporation

Inventor： Xiaogang Qiu , Ronny Krashinsky , Steven Heinrich , Shirish Gadre , John Edmondson , Jack Choquette , Mark Gebhart , Ramesh Jandhyala , Poornachandra Rao , Omkar Paranjape , Michael Siu

IPC: G06F12/02 , G06F13/28 , G06F12/0891 , G06F12/0811 , G06F12/084

Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.

14.

发明授权
Techniques for maintaining atomicity and ordering for pixel shader operations 有权

公开(公告)号：US10019776B2

公开(公告)日：2018-07-10

申请号：US14924624

申请日：2015-10-27

Applicant: NVIDIA CORPORATION

Inventor： Ziyad Hakura , Eric Lum , Dale Kirkland , Jack Choquette , Patrick R. Brown , Yury Y. Uralsky , Jeffrey Bolz

IPC: G06T1/20 , G06T1/60

CPC classification number: G06T1/20 , G06T1/60 , G06T11/40

Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.

15.

发明授权
Programmatically controlled data multicasting across multiple compute engines 有权

公开(公告)号：US12020035B2

公开(公告)日：2024-06-25

申请号：US17691288

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Apoorv Parle , Ronny Krashinsky , John Edmondson , Jack Choquette , Shirish Gadre , Steve Heinrich , Manan Patel , Prakash Bangalore Prabhakar, Jr. , Ravi Manyam , Wish Gandhi , Lacky Shah , Alexander L. Minkin

IPC: G06F5/06 , G06F9/38 , G06F9/48 , G06F9/52 , G06F13/16 , G06F13/40 , G06T1/20 , G06T1/60 , H04L49/101

CPC classification number: G06F9/3887 , G06F9/522 , G06F13/1689 , G06F13/4022 , G06T1/20 , G06T1/60 , H04L49/101

Abstract: This specification describes a programmatic multicast technique enabling one thread (for example, in a cooperative group array (CGA) on a GPU) to request data on behalf of one or more other threads (for example, executing on respective processor cores of the GPU). The multicast is supported by tracking circuitry that interfaces between multicast requests received from processor cores and the available memory. The multicast is designed to reduce cache (for example, layer 2 cache) bandwidth utilization enabling strong scaling and smaller tile sizes.

16.

发明公开
SCALARIZATION OF INSTRUCTIONS FOR SIMT ARCHITECTURES 审中-公开

公开(公告)号：US20240118899A1

公开(公告)日：2024-04-11

申请号：US18105679

申请日：2023-02-03

Applicant: NVIDIA Corporation

Inventor： Aditya Avinash Atluri , Jack Choquette , Carter Edwards , Olivier Giroux , Praveen Kumar Kaushik , Ronny Krashinsky , Rishkul Kulkarni , Konstantinos Kyriakopoulos

IPC: G06F9/38

CPC classification number: G06F9/3851

Abstract: Apparatuses, systems, and techniques to adapt instructions in a SIMT architecture for execution on serial execution units. In at least one embodiment, a set of one or more threads is selected from a group of active threads associated with an instruction and the instruction is executed for the set of one or more threads on a serial execution unit.

17.

发明授权
High performance synchronization mechanisms for coordinating operations on a computer system 有权

公开(公告)号：US11803380B2

公开(公告)日：2023-10-31

申请号：US16712236

申请日：2019-12-12

Applicant: NVIDIA Corporation

Inventor： Olivier Giroux , Jack Choquette , Ronny Krashinsky , Steve Heinrich , Xiaogang Qiu , Shirish Gadre

IPC: G06F7/08 , G06F9/30 , G06F12/0808 , G06F12/0888 , G06F9/32 , G06F9/38 , G06F9/52 , G06F9/54

CPC classification number: G06F9/30043 , G06F9/3009 , G06F9/321 , G06F9/3838 , G06F9/3871 , G06F9/522 , G06F9/542 , G06F9/544 , G06F9/546 , G06F12/0808 , G06F12/0888 , G06F9/3004 , G06F2212/621

Abstract: To synchronize operations of a computing system, a new type of synchronization barrier is disclosed. In one embodiment, the disclosed synchronization barrier provides for certain synchronization mechanisms such as, for example, “Arrive” and “Wait” to be split to allow for greater flexibility and efficiency in coordinating synchronization. In another embodiment, the disclosed synchronization barrier allows for hardware components such as, for example, dedicated copy or direct-memory-access (DMA) engines to be synchronized with software-based threads.

18.

发明授权
Simultaneous compute and graphics scheduling 有权

公开(公告)号：US11367160B2

公开(公告)日：2022-06-21

申请号：US16053341

申请日：2018-08-02

Applicant: NVIDIA Corporation

Inventor： Rajballav Dash , Gregory Palmer , Gentaro Hirota , Lacky Shah , Jack Choquette , Emmett Kilgariff , Sriharsha Niverty , Milton Lei , Shirish Gadre , Omkar Paranjape , Lei Yang , Rouslan Dimitrov

IPC: G06T1/20 , G06F9/38 , G06F15/80

Abstract: A parallel processing unit (e.g., a GPU), in some examples, includes a hardware scheduler and hardware arbiter that launch graphics and compute work for simultaneous execution on a SIMD/SIMT processing unit. Each processing unit (e.g., a streaming multiprocessor) of the parallel processing unit operates in a graphics-greedy mode or a compute-greedy mode at respective times. The hardware arbiter, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, can selectively cause the processing unit to run one or more compute work items from a compute queue when the processing unit is operating in the graphics-greedy mode, and cause the processing unit to run one or more graphics work items from a graphics queue when the processing unit is operating in the compute-greedy mode. Associated methods and systems are also described.

19.

发明授权
Robust, efficient multiprocessor-coprocessor interface 有权

公开(公告)号：US11138009B2

公开(公告)日：2021-10-05

申请号：US16101247

申请日：2018-08-10

Applicant: NVIDIA Corporation

Inventor： Ronald Charles Babich, Jr. , John Burgess , Jack Choquette , Tero Karras , Samuli Laine , Ignacio Llamas , Gregory Muthler , William Parsons Newhall, Jr.

IPC: G06F9/30 , G06F9/38 , G06T1/20 , G06T1/60 , G06F9/48

Abstract: Systems and methods for an efficient and robust multiprocessor-coprocessor interface that may be used between a streaming multiprocessor and an acceleration coprocessor in a GPU are provided. According to an example implementation, in order to perform an acceleration of a particular operation using the coprocessor, the multiprocessor: issues a series of write instructions to write input data for the operation into coprocessor-accessible storage locations, issues an operation instruction to cause the coprocessor to execute the particular operation; and then issues a series of read instructions to read result data of the operation from coprocessor-accessible storage locations to multiprocessor-accessible storage locations.

20.

发明授权
Techniques for maintaining atomicity and ordering for pixel shader operations 有权

公开(公告)号：US10032245B2

公开(公告)日：2018-07-24

申请号：US14924628

申请日：2015-10-27

Applicant: NVIDIA CORPORATION

Inventor： Ziyad Hakura , Eric Lum , Dale Kirkland , Jack Choquette , Patrick R. Brown , Yury Y. Uralsky , Jeffrey Bolz

IPC: G06T1/20 , G06T1/60

Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification