Patent search ap:("Nvidia Corporation") AND inv:"Lacky Shah" Page 1

1.

发明授权
Distributed shared memory 有权

公开(公告)号：US12248788B2

公开(公告)日：2025-03-11

申请号：US17691690

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Prakash Bangalore Prabhakar , Gentaro Hirota , Ronny Krashinsky , Ze Long , Brian Pharris , Rajballav Dash , Jeff Tuckey , Jerome F. Duluk, Jr. , Lacky Shah , Luke Durant , Jack Choquette , Eric Werness , Naman Govil , Manan Patel , Shayani Deb , Sandeep Navada , John Edmondson , Greg Palmer , Wish Gandhi , Ravi Manyam , Apoorv Parle , Olivier Giroux , Shirish Gadre , Steve Heinrich

IPC: G06F9/30 , G06F9/38 , G06F9/52 , G06F9/54 , G06F13/16 , G06T1/60

Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache. Such distributed shared memory supports cooperative parallelism and strong scaling across multiple processing cores by permitting data sharing and communications previously possible only within the same processing core.

2.

发明申请
USER-DEFINED METERED PRIORITY QUEUES 有权

公开(公告)号：US20210191754A1

公开(公告)日：2021-06-24

申请号：US16722422

申请日：2019-12-20

Applicant: Nvidia Corporation

Inventor： Jonathon Evans , Lacky Shah , Phil Johnson , Jonah Alben , Brian Pharris , Greg Palmer , Brian Fahs

IPC: G06F9/48 , G06N3/08

Abstract: Apparatuses, systems, and techniques to optimize processor resources at a user-defined level. In at least one embodiment, priority of one or more tasks are adjusted to prevent one or more other dependent tasks from entering an idle state due to lack of resources to consume.

3.

发明授权
Organizing memory to optimize memory accesses of compressed data 有权

公开(公告)号：US09934145B2

公开(公告)日：2018-04-03

申请号：US14925922

申请日：2015-10-28

Applicant: NVIDIA CORPORATION

Inventor： Praveen Krishnamurthy , Peter B. Holmquist , Wishwesh Gandhi , Timothy Purcell , Karan Mehra , Lacky Shah

IPC: G06F12/08 , G06F12/0802 , G06F3/06

CPC classification number: G06F12/0802 , G06F3/0608 , G06F3/064 , G06F3/0673 , G06F12/0842 , G06F12/0844 , G06F12/0848 , G06F12/0851 , G06F12/0853 , G06F12/0895 , G06F2212/1016 , G06F2212/401 , G06F2212/608

Abstract: In one embodiment of the present invention a cache unit organizes data stored in an attached memory to optimize accesses to compressed data. In operation, the cache unit introduces a layer of indirection between a physical address associated with a memory access request and groups of blocks in the attached memory. The layer of indirection—virtual tiles—enables the cache unit to selectively store compressed data that would conventionally be stored in separate physical tiles included in a group of blocks in a single physical tile. Because the cache unit stores compressed data associated with multiple physical tiles in a single physical tile and, more specifically, in adjacent locations within the single physical tile, the cache unit coalesces the compressed data into contiguous blocks. Subsequently, upon performing a read operation, the cache unit may retrieve the compressed data conventionally associated with separate physical tiles in a single read operation.

4.

发明授权
Programmatically controlled data multicasting across multiple compute engines 有权

公开(公告)号：US12020035B2

公开(公告)日：2024-06-25

申请号：US17691288

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Apoorv Parle , Ronny Krashinsky , John Edmondson , Jack Choquette , Shirish Gadre , Steve Heinrich , Manan Patel , Prakash Bangalore Prabhakar, Jr. , Ravi Manyam , Wish Gandhi , Lacky Shah , Alexander L. Minkin

IPC: G06F5/06 , G06F9/38 , G06F9/48 , G06F9/52 , G06F13/16 , G06F13/40 , G06T1/20 , G06T1/60 , H04L49/101

CPC classification number: G06F9/3887 , G06F9/522 , G06F13/1689 , G06F13/4022 , G06T1/20 , G06T1/60 , H04L49/101

Abstract: This specification describes a programmatic multicast technique enabling one thread (for example, in a cooperative group array (CGA) on a GPU) to request data on behalf of one or more other threads (for example, executing on respective processor cores of the GPU). The multicast is supported by tracking circuitry that interfaces between multicast requests received from processor cores and the available memory. The multicast is designed to reduce cache (for example, layer 2 cache) bandwidth utilization enabling strong scaling and smaller tile sizes.

5.

发明授权
Simultaneous compute and graphics scheduling 有权

公开(公告)号：US11367160B2

公开(公告)日：2022-06-21

申请号：US16053341

申请日：2018-08-02

Applicant: NVIDIA Corporation

Inventor： Rajballav Dash , Gregory Palmer , Gentaro Hirota , Lacky Shah , Jack Choquette , Emmett Kilgariff , Sriharsha Niverty , Milton Lei , Shirish Gadre , Omkar Paranjape , Lei Yang , Rouslan Dimitrov

IPC: G06T1/20 , G06F9/38 , G06F15/80

Abstract: A parallel processing unit (e.g., a GPU), in some examples, includes a hardware scheduler and hardware arbiter that launch graphics and compute work for simultaneous execution on a SIMD/SIMT processing unit. Each processing unit (e.g., a streaming multiprocessor) of the parallel processing unit operates in a graphics-greedy mode or a compute-greedy mode at respective times. The hardware arbiter, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, can selectively cause the processing unit to run one or more compute work items from a compute queue when the processing unit is operating in the graphics-greedy mode, and cause the processing unit to run one or more graphics work items from a graphics queue when the processing unit is operating in the compute-greedy mode. Associated methods and systems are also described.

6.

发明授权
Organizing memory to optimize memory accesses of compressed data 有权

公开(公告)号：US10402323B2

公开(公告)日：2019-09-03

申请号：US14925920

申请日：2015-10-28

Applicant: NVIDIA CORPORATION

Inventor： Praveen Krishnamurthy , Peter B. Holmquist , Wishwesh Gandhi , Timothy Purcell , Karan Mehra , Lacky Shah

IPC: G06F12/08 , G06F12/0802 , G06F3/06

Abstract: In one embodiment of the present invention a cache unit organizes data stored in an attached memory to optimize accesses to compressed data. In operation, the cache unit introduces a layer of indirection between a physical address associated with a memory access request and groups of blocks in the attached memory. The layer of indirection—virtual tiles—enables the cache unit to selectively store compressed data that would conventionally be stored in separate physical tiles included in a group of blocks in a single physical tile. Because the cache unit stores compressed data associated with multiple physical tiles in a single physical tile and, more specifically, in adjacent locations within the single physical tile, the cache unit coalesces the compressed data into contiguous blocks. Subsequently, upon performing a read operation, the cache unit may retrieve the compressed data conventionally associated with separate physical tiles in a single read operation.

7.

发明授权
User-defined metered priority queues 有权

公开(公告)号：US11954518B2

公开(公告)日：2024-04-09

申请号：US16722422

申请日：2019-12-20

Applicant: Nvidia Corporation

Inventor： Jonathon Evans , Lacky Shah , Phil Johnson , Jonah Alben , Brian Pharris , Greg Palmer , Brian Fahs

IPC: G06F9/48 , G06N3/08

CPC classification number: G06F9/4831 , G06N3/08

Abstract: Apparatuses, systems, and techniques to optimize processor resources at a user-defined level. In at least one embodiment, priority of one or more tasks are adjusted to prevent one or more other dependent tasks from entering an idle state due to lack of resources to consume.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification