Patent search ap:("NVIDIA CORPORATION") AND inv:"Jack Choquette" Page 3

21.

发明授权
Distributed shared memory 有权

公开(公告)号：US12248788B2

公开(公告)日：2025-03-11

申请号：US17691690

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Prakash Bangalore Prabhakar , Gentaro Hirota , Ronny Krashinsky , Ze Long , Brian Pharris , Rajballav Dash , Jeff Tuckey , Jerome F. Duluk, Jr. , Lacky Shah , Luke Durant , Jack Choquette , Eric Werness , Naman Govil , Manan Patel , Shayani Deb , Sandeep Navada , John Edmondson , Greg Palmer , Wish Gandhi , Ravi Manyam , Apoorv Parle , Olivier Giroux , Shirish Gadre , Steve Heinrich

IPC: G06F9/30 , G06F9/38 , G06F9/52 , G06F9/54 , G06F13/16 , G06T1/60

Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache. Such distributed shared memory supports cooperative parallelism and strong scaling across multiple processing cores by permitting data sharing and communications previously possible only within the same processing core.

22.

发明申请
TECHNIQUES FOR EFFICIENTLY TRANSFERRING DATA TO A PROCESSOR 有权

公开(公告)号：US20210124582A1

公开(公告)日：2021-04-29

申请号：US16712083

申请日：2019-12-12

Applicant: NVIDIA Corporation

Inventor： Andrew Kerr , Jack Choquette , Xiaogang Qiu , Omkar Paranjape , Poornachandra Rao , Shirish Gadre , Steven J. Heinrich , Manan Patel , Olivier Giroux , Alan Kaatz

IPC: G06F9/30 , G06F12/0888 , G06F12/0808

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

23.

发明授权
Techniques for comprehensively synchronizing execution threads 有权

公开(公告)号：US10977037B2

公开(公告)日：2021-04-13

申请号：US16595398

申请日：2019-10-07

Applicant: NVIDIA Corporation

Inventor： Ajay Sudarshan Tirumala , Olivier Giroux , Peter Nelson , Jack Choquette

IPC: G06F9/30 , G06F9/38 , G06F9/46

Abstract: In one embodiment, a synchronization instruction causes a processor to ensure that specified threads included within a warp concurrently execute a single subsequent instruction. The specified threads include at least a first thread and a second thread. In operation, the first thread arrives at the synchronization instruction. The processor determines that the second thread has not yet arrived at the synchronization instruction and configures the first thread to stop executing instructions. After issuing at least one instruction for the second thread, the processor determines that all the specified threads have arrived at the synchronization instruction. The processor then causes all the specified threads to execute the subsequent instruction. Advantageously, unlike conventional approaches to synchronizing threads, the synchronization instruction enables the processor to reliably and properly execute code that includes complex control flows and/or instructions that presuppose that threads are converged.

24.

发明授权
Thread-level sleep in a multithreaded architecture 有权

公开(公告)号：US10817295B2

公开(公告)日：2020-10-27

申请号：US15582549

申请日：2017-04-28

Applicant: NVIDIA CORPORATION

Inventor： Olivier Giroux , Peter Nelson , Jack Choquette , Ajay Sudarshan Tirumala

IPC: G06F9/30 , G06F9/38 , G06F9/46

Abstract: A streaming multiprocessor (SM) includes a nanosleep (NS) unit configured to cause individual threads executing on the SM to sleep for a programmer-specified interval of time. For a given thread, the NS unit parses a NANOSLEEP instruction and extracts a sleep time. The NS unit then maps the sleep time to a single bit of a timer and causes the thread to sleep. When the timer bit changes, the sleep time expires, and the NS unit awakens the thread. The thread may then continue executing. The SM also includes a nanotrap (NT) unit configured to issue traps using a similar timing mechanism to that described above. For a given thread, the NT unit parses a NANOTRAP instruction and extracts a trap time. The NT unit then maps the trap time to a single bit of a timer. When the timer bit changes, the NT unit issues a trap.

25.

发明授权
Unified cache for diverse memory traffic 有权

公开(公告)号：US10705994B2

公开(公告)日：2020-07-07

申请号：US15587213

申请日：2017-05-04

Applicant: NVIDIA Corporation

Inventor： Xiaogang Qiu , Ronny Krashinsky , Steven Heinrich , Shirish Gadre , John Edmondson , Jack Choquette , Mark Gebhart , Ramesh Jandhyala , Poornachandra Rao , Omkar Paranjape , Michael Siu

IPC: G06F12/084 , G06F13/28 , G06F12/0891 , G06F12/0811 , G06F12/0895 , G06F12/122 , G11C7/10

Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.

26.

发明授权
Techniques for maintaining atomicity and ordering for pixel shader operations 有权

公开(公告)号：US10453168B2

公开(公告)日：2019-10-22

申请号：US15999185

申请日：2018-08-17

Applicant: NVIDIA Corporation

Inventor： Ziyad Hakura , Eric Lum , Dale Kirkland , Jack Choquette , Patrick R. Brown , Yury Y. Uralsky , Jeffrey Bolz

IPC: G06T1/20 , G06T1/60 , G06T11/40

Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.

27.

发明授权
Techniques for comprehensively synchronizing execution threads 有权

公开(公告)号：US10437593B2

公开(公告)日：2019-10-08

申请号：US15499843

申请日：2017-04-27

Applicant: NVIDIA Corporation

Inventor： Ajay Sudarshan Tirumala , Olivier Giroux , Peter Nelson , Jack Choquette

IPC: G06F9/30 , G06F9/38 , G06F9/46

Abstract: A synchronization instruction causes a processor to ensure that specified threads included within a warp concurrently execute a single subsequent instruction. The specified threads include at least a first thread and a second thread. In operation, the first thread arrives at the synchronization instruction. The processor determines that the second thread has not yet arrived at the synchronization instruction and configures the first thread to stop executing instructions. After issuing at least one instruction for the second thread, the processor determines that all the specified threads have arrived at the synchronization instruction. The processor then causes all the specified threads to execute the subsequent instruction. Advantageously, unlike conventional approaches to synchronizing threads, the synchronization instruction enables the processor to reliably and properly execute code that includes complex control flows and/or instructions that presuppose that threads are converged.

28.

发明授权
Techniques for maintaining atomicity and ordering for pixel shader operations 有权

公开(公告)号：US10055806B2

公开(公告)日：2018-08-21

申请号：US14924618

申请日：2015-10-27

Applicant: NVIDIA CORPORATION

Inventor： Ziyad Hakura , Eric Lum , Dale Kirkland , Jack Choquette , Patrick R. Brown , Yury Y. Uralsky , Jeffrey Bolz

IPC: G06T1/20 , G06T1/60

CPC classification number: G06T1/20 , G06T1/60 , G06T11/40

Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification