Patent search ap:("NVIDIA Corporation") AND inv:"John EDMONDSON" Page 1

1.

发明公开
PROGRAMMATICALLY CONTROLLED DATA MULTICASTING ACROSS MULTIPLE COMPUTE ENGINES 审中-公开

公开(公告)号：US20230289190A1

公开(公告)日：2023-09-14

申请号：US17691288

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Apoorv PARLE , Ronny KRASHINSKY , John EDMONDSON , Jack CHOQUETTE , Shirish GADRE , Steve HEINRICH , Manan PATEL , Prakash Bangalore PRABHAKAR, JR. , Ravi MANYAM , Wish GANDHI , Lacky SHAH , Alexander L. Minkin

IPC: G06F9/38 , G06F9/52 , G06F13/40 , G06F13/16 , H04L49/101 , G06T1/20 , G06T1/60

CPC classification number: G06F9/3887 , G06F9/522 , G06F13/4022 , G06F13/1689 , H04L49/101 , G06T1/20 , G06T1/60

Abstract: This specification describes a programmatic multicast technique enabling one thread (for example, in a cooperative group array (CGA) on a GPU) to request data on behalf of one or more other threads (for example, executing on respective processor cores of the GPU). The multicast is supported by tracking circuitry that interfaces between multicast requests received from processor cores and the available memory. The multicast is designed to reduce cache (for example, layer 2 cache) bandwidth utilization enabling strong scaling and smaller tile sizes.

2.

发明公开
PROGRAMMATICALLY CONTROLLED DATA MULTICASTING ACROSS MULTIPLE COMPUTE ENGINES 审中-公开

公开(公告)号：US20240289132A1

公开(公告)日：2024-08-29

申请号：US18660763

申请日：2024-05-10

Applicant: NVIDIA Corporation

Inventor： Apoorv PARLE , Ronny KRASHINSKY , John EDMONDSON , Jack CHOQUETTE , Shirish GADRE , Steve HEINRICH , Manan PATEL , Prakash Bangalore PRABHAKAR, JR. , Ravi MANYAM , Wish GANDHI , Lacky SHAH , Alexander L. Minkin

IPC: G06F9/38 , G06F9/52 , G06F13/16 , G06F13/40 , G06T1/20 , G06T1/60 , H04L49/101

CPC classification number: G06F9/3887 , G06F9/522 , G06F13/1689 , G06F13/4022 , G06T1/20 , G06T1/60 , H04L49/101

Abstract: This specification describes a programmatic multicast technique enabling one thread (for example, in a cooperative group array (CGA) on a GPU) to request data on behalf of one or more other threads (for example, executing on respective processor cores of the GPU). The multicast is supported by tracking circuitry that interfaces between multicast requests received from processor cores and the available memory. The multicast is designed to reduce cache (for example, layer 2 cache) bandwidth utilization enabling strong scaling and smaller tile sizes.

3.

发明公开
Distributed Shared Memory 审中-公开

公开(公告)号：US20230289189A1

公开(公告)日：2023-09-14

申请号：US17691690

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Prakash BANGALORE PRABHAKAR , Gentaro HIROTA , Ronny KRASHINSKY , Ze LONG , Brian PHARRIS , Rajballav DASH , Jeff TUCKEY , Jerome F. DULUK, JR. , Lacky SHAH , Luke DURANT , Jack CHOQUETTE , Eric WERNESS , Naman GOVIL , Manan PATEL , Shayani DEB , Sandeep NAVADA , John EDMONDSON , Greg PALMER , Wish GANDHI , Ravi MANYAM , Apoorv PARLE , Olivier GIROUX , Shirish GADRE , Steve HEINRICH

IPC: G06F3/06

CPC classification number: G06F3/064 , G06F3/0604 , G06F3/0679

Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache. Such distributed shared memory supports cooperative parallelism and strong scaling across multiple processing cores by permitting data sharing and communications previously possible only within the same processing core.

4.

发明公开
HARDWARE ACCELERATED SYNCHRONIZATION WITH ASYNCHRONOUS TRANSACTION SUPPORT 审中-公开

公开(公告)号：US20230289242A1

公开(公告)日：2023-09-14

申请号：US17691296

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Timothy GUO , Jack CHOQUETTE , Shirish GADRE , Olivier GIROUX , Carter EDWARDS , John EDMONDSON , Manan PATEL , Raghavan MADHAVAN, JR. , Jessie HUANG , Peter NELSON , Ronny KRASHINSKY

IPC: G06F9/52

CPC classification number: G06F9/522 , G06F2209/521

Abstract: A new transaction barrier synchronization primitive enables executing threads and asynchronous transactions to synchronize across parallel processors. The asynchronous transactions may include transactions resulting from, for example, hardware data movement units such as direct memory units, etc. A hardware synchronization circuit may provide for the synchronization primitive to be stored in a cache memory so that barrier operations may be accelerated by the circuit. A new wait mechanism reduces software overhead associated with waiting on a barrier.

5.

发明公开
Cooperative Group Arrays 审中-公开

公开(公告)号：US20230289215A1

公开(公告)日：2023-09-14

申请号：US17691621

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Greg PALMER , Gentaro HIROTA , Ronny KRASHINSKY , Ze LONG , Brian PHARRIS , Rajballav DASH , Jeff TUCKEY , Jerome F. DULUK, JR. , Lacky SHAH , Luke DURANT , Jack CHOQUETTE , Eric WERNESS , Naman GOVIL , Manan PATEL , Shayani DEB , Sandeep NAVADA , John EDMONDSON , Prakash BANGALORE PRABHAKAR , Wish GANDHI , Ravi MANYAM , Apoorv PARLE , Olivier GIROUX , Shirish GADRE , Steve HEINRICH

IPC: G06F9/48 , G06F9/38 , G06F9/30 , G06F9/54

CPC classification number: G06F9/4881 , G06F9/3851 , G06F9/3009 , G06F9/544

Abstract: A new level(s) of hierarchy—Cooperate Group Arrays (CGAs)—and an associated new hardware-based work distribution/execution model is described. A CGA is a grid of thread blocks (also referred to as cooperative thread arrays (CTAs)). CGAs provide co-scheduling, e.g., control over where CTAs are placed/executed in a processor (such as a GPU), relative to the memory required by an application and relative to each other. Hardware support for such CGAs guarantees concurrency and enables applications to see more data locality, reduced latency, and better synchronization between all the threads in tightly cooperating collections of CTAs programmably distributed across different (e.g., hierarchical) hardware domains or partitions.

6.

发明公开
FAST DATA SYNCHRONIZATION IN PROCESSORS AND MEMORY 审中-公开

公开(公告)号：US20230315655A1

公开(公告)日：2023-10-05

申请号：US17691303

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Jack CHOQUETTE , Ronny KRASHINSKY , Timothy GUO , Carter EDWARDS , Steve HEINRICH , John EDMONDSON , Prakash Bangalore PRABHAKAR , Apoorv PARLE, JR. , Manan PATEL , Olivier GIROUX , Michael PELLAUER

IPC: G06F13/16

CPC classification number: G06F13/1689 , G06F13/1673

Abstract: A new synchronization system synchronizes data exchanges between producer processes and consumer processes which may be on the same or different processors in a multiprocessor system. The synchronization incurs less than one roundtrip of latency - in some implementations, in approximately 0.5 roundtrip times. A key aspect of the fast synchronization is that the producer’s data store is followed without delay with the updating of a barrier on which the consumer is waiting.

7.

发明申请
UNIFIED CACHE FOR DIVERSE MEMORY TRAFFIC 审中-公开

公开(公告)号：US20180322078A1

公开(公告)日：2018-11-08

申请号：US15716461

申请日：2017-09-26

Applicant: NVIDIA Corporation

Inventor： Xiaogang QIU , Ronny KRASHINSKY , Steven HEINRICH , Shirish GADRE , John EDMONDSON , Jack CHOQUETTE , Mark GEBHART , Ramesh JANDHYALA , Poornachandra RAO , Omkar PARANJAPE , Michael SIU

IPC: G06F13/28 , G06F12/0811 , G06F12/0891 , G06F12/084

Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.

8.

发明申请
UNIFIED CACHE FOR DIVERSE MEMORY TRAFFIC 审中-公开

公开(公告)号：US20180322077A1

公开(公告)日：2018-11-08

申请号：US15587213

申请日：2017-05-04

Applicant: NVIDIA Corporation

Inventor： Xiaogang QIU , Ronny KRASHINSKY , Steven HEINRICH , Shirish GADRE , John EDMONDSON , Jack CHOQUETTE , Mark GEBHART , Ramesh JANDHYALA , Poornachandra RAO , Omkar PARANJAPE , Michael SIU

IPC: G06F13/28 , G06F12/0891 , G06F12/0811 , G06F12/084

Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification