Patent search ap:("Nvidia Corporation") AND inv:"Brian Pharris" Page 1

1.

发明授权
Dynamic partitioning of execution resources 有权

公开(公告)号：US10817338B2

公开(公告)日：2020-10-27

申请号：US15885761

申请日：2018-01-31

Applicant: NVIDIA Corporation

Inventor： Jerome F. Duluk, Jr. , Luke Durant , Ramon Matas Navarro , Alan Menezes , Jeffrey Tuckey , Gentaro Hirota , Brian Pharris

IPC: G06F9/50 , G06F12/02 , G06T1/60 , G06T1/20

Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.

2.

发明申请
DYNAMIC PARTITIONING OF EXECUTION RESOURCES 审中-公开

公开(公告)号：US20190235924A1

公开(公告)日：2019-08-01

申请号：US15885761

申请日：2018-01-31

Applicant: NVIDIA Corporation

Inventor： Jerome F. Duluk, Jr. , Luke Durant , Ramon Matas Navarro , Alan Menezes , Jeffrey Tuckey , Gentaro Hirota , Brian Pharris

IPC: G06F9/50 , G06F12/02 , G06T1/20 , G06T1/60

Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.

3.

发明授权
User-defined metered priority queues 有权

公开(公告)号：US11954518B2

公开(公告)日：2024-04-09

申请号：US16722422

申请日：2019-12-20

Applicant: Nvidia Corporation

Inventor： Jonathon Evans , Lacky Shah , Phil Johnson , Jonah Alben , Brian Pharris , Greg Palmer , Brian Fahs

IPC: G06F9/48 , G06N3/08

CPC classification number: G06F9/4831 , G06N3/08

Abstract: Apparatuses, systems, and techniques to optimize processor resources at a user-defined level. In at least one embodiment, priority of one or more tasks are adjusted to prevent one or more other dependent tasks from entering an idle state due to lack of resources to consume.

4.

发明授权
Pre-fetching task descriptors of dependent tasks 有权

公开(公告)号：US11182207B2

公开(公告)日：2021-11-23

申请号：US16450508

申请日：2019-06-24

Applicant: NVIDIA CORPORATION

Inventor： Gentaro Hirota , Brian Pharris , Jeff Tuckey , Robert Overman , Stephen Jones

IPC: G06F9/48 , G06F9/52

Abstract: Techniques are disclosed for reducing the latency between the completion of a producer task and the launch of a consumer task dependent on the producer task. Such latency exists when the information needed to launch the consumer task is unavailable when the producer task completes. Thus, various techniques are disclosed, where a task management unit initiates the retrieval of the information needed to launch the consumer task from memory in parallel with the producer task being launched. Because the retrieval of such information is initiated in parallel with the launch of the producer task, the information is often available when the producer task completes, thus allowing for the consumer task to be launched without delay. The disclosed techniques, therefore, enable the latency between completing the producer task and launching the consumer task to be reduced.

5.

发明授权
Distributed shared memory 有权

公开(公告)号：US12248788B2

公开(公告)日：2025-03-11

申请号：US17691690

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Prakash Bangalore Prabhakar , Gentaro Hirota , Ronny Krashinsky , Ze Long , Brian Pharris , Rajballav Dash , Jeff Tuckey , Jerome F. Duluk, Jr. , Lacky Shah , Luke Durant , Jack Choquette , Eric Werness , Naman Govil , Manan Patel , Shayani Deb , Sandeep Navada , John Edmondson , Greg Palmer , Wish Gandhi , Ravi Manyam , Apoorv Parle , Olivier Giroux , Shirish Gadre , Steve Heinrich

IPC: G06F9/30 , G06F9/38 , G06F9/52 , G06F9/54 , G06F13/16 , G06T1/60

Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache. Such distributed shared memory supports cooperative parallelism and strong scaling across multiple processing cores by permitting data sharing and communications previously possible only within the same processing core.

6.

发明授权
Dynamic partitioning of execution resources 有权

公开(公告)号：US11307903B2

公开(公告)日：2022-04-19

申请号：US15885751

申请日：2018-01-31

Applicant: NVIDIA Corporation

Inventor： Jerome F. Duluk, Jr. , Luke Durant , Ramon Matas Navarro , Alan Menezes , Jeffrey Tuckey , Gentaro Hirota , Brian Pharris

IPC: G06F9/50 , G06F9/48 , G06F9/455

Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.

7.

发明申请
USER-DEFINED METERED PRIORITY QUEUES 有权

公开(公告)号：US20210191754A1

公开(公告)日：2021-06-24

申请号：US16722422

申请日：2019-12-20

Applicant: Nvidia Corporation

Inventor： Jonathon Evans , Lacky Shah , Phil Johnson , Jonah Alben , Brian Pharris , Greg Palmer , Brian Fahs

IPC: G06F9/48 , G06N3/08

Abstract: Apparatuses, systems, and techniques to optimize processor resources at a user-defined level. In at least one embodiment, priority of one or more tasks are adjusted to prevent one or more other dependent tasks from entering an idle state due to lack of resources to consume.

8.

发明申请
DYNAMIC PARTITIONING OF EXECUTION RESOURCES 审中-公开

公开(公告)号：US20190235928A1

公开(公告)日：2019-08-01

申请号：US15885751

申请日：2018-01-31

Applicant: NVIDIA Corporation

Inventor： Jerome F. Duluk, JR. , Luke Durant , Ramon Matas Navarro , Alan Menezes , Jeffrey Tuckey , Gentaro Hirota , Brian Pharris

IPC: G06F9/50 , G06F9/48

CPC classification number: G06F9/5061 , G06F9/45558 , G06F9/4881 , G06F9/505 , G06F2209/5018

Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification