Patent search ap:("NVIDIA Corporation") AND inv:"Manan Patel" Page 1

1.

发明授权
Distributed shared memory 有权

公开(公告)号：US12248788B2

公开(公告)日：2025-03-11

申请号：US17691690

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Prakash Bangalore Prabhakar , Gentaro Hirota , Ronny Krashinsky , Ze Long , Brian Pharris , Rajballav Dash , Jeff Tuckey , Jerome F. Duluk, Jr. , Lacky Shah , Luke Durant , Jack Choquette , Eric Werness , Naman Govil , Manan Patel , Shayani Deb , Sandeep Navada , John Edmondson , Greg Palmer , Wish Gandhi , Ravi Manyam , Apoorv Parle , Olivier Giroux , Shirish Gadre , Steve Heinrich

IPC: G06F9/30 , G06F9/38 , G06F9/52 , G06F9/54 , G06F13/16 , G06T1/60

Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache. Such distributed shared memory supports cooperative parallelism and strong scaling across multiple processing cores by permitting data sharing and communications previously possible only within the same processing core.

2.

发明申请
TECHNIQUES FOR EFFICIENTLY TRANSFERRING DATA TO A PROCESSOR 有权

公开(公告)号：US20210124582A1

公开(公告)日：2021-04-29

申请号：US16712083

申请日：2019-12-12

Applicant: NVIDIA Corporation

Inventor： Andrew Kerr , Jack Choquette , Xiaogang Qiu , Omkar Paranjape , Poornachandra Rao , Shirish Gadre , Steven J. Heinrich , Manan Patel , Olivier Giroux , Alan Kaatz

IPC: G06F9/30 , G06F12/0888 , G06F12/0808

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

3.

发明授权
Programmatically controlled data multicasting across multiple compute engines 有权

公开(公告)号：US12020035B2

公开(公告)日：2024-06-25

申请号：US17691288

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Apoorv Parle , Ronny Krashinsky , John Edmondson , Jack Choquette , Shirish Gadre , Steve Heinrich , Manan Patel , Prakash Bangalore Prabhakar, Jr. , Ravi Manyam , Wish Gandhi , Lacky Shah , Alexander L. Minkin

IPC: G06F5/06 , G06F9/38 , G06F9/48 , G06F9/52 , G06F13/16 , G06F13/40 , G06T1/20 , G06T1/60 , H04L49/101

CPC classification number: G06F9/3887 , G06F9/522 , G06F13/1689 , G06F13/4022 , G06T1/20 , G06T1/60 , H04L49/101

Abstract: This specification describes a programmatic multicast technique enabling one thread (for example, in a cooperative group array (CGA) on a GPU) to request data on behalf of one or more other threads (for example, executing on respective processor cores of the GPU). The multicast is supported by tracking circuitry that interfaces between multicast requests received from processor cores and the available memory. The multicast is designed to reduce cache (for example, layer 2 cache) bandwidth utilization enabling strong scaling and smaller tile sizes.

4.

发明授权
Method and apparatus for efficient access to multidimensional data structures and/or other large data blocks 有权

公开(公告)号：US12141082B2

公开(公告)日：2024-11-12

申请号：US17691276

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Alexander L. Minkin , Alan Kaatz , Oliver Giroux , Jack Choquette , Shirish Gadre , Manan Patel , John Tran , Ronny Krashinsky , Jeff Schottmiller

IPC: G06F13/16

Abstract: A parallel processing unit comprises a plurality of processors each being coupled to a memory access hardware circuitry. Each memory access hardware circuitry is configured to receive, from the coupled processor, a memory access request specifying a coordinate of a multidimensional data structure, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to a respective one of the processors; and, in response to the memory access request, translate the coordinate of the multidimensional data structure into plural memory addresses for the multidimensional data structure and using the plural memory addresses, asynchronously transfer at least a portion of the multidimensional data structure for processing by at least the coupled processor. The memory locations may be in the shared memory of the coupled processor and/or an external memory.

5.

发明公开
METHOD AND APPARATUS FOR EFFICIENT ACCESS TO MULTIDIMENSIONAL DATA STRUCTURES AND/OR OTHER LARGE DATA BLOCKS 审中-公开

公开(公告)号：US20230289292A1

公开(公告)日：2023-09-14

申请号：US17691422

申请日：2022-03-10

Applicant: NVIDIA Corporation

Inventor： Alexander L. Minkin , Alan Kaatz , Olivier Giroux , Jack Choquette , Shirish Gadre , Manan Patel , John Tran , Ronny Krashinsky , Jeff Schottmiller

IPC: G06F12/0875

CPC classification number: G06F12/0875 , G06F2212/62 , G06F2212/452

Abstract: A parallel processing unit comprises a plurality of processors each being coupled to a memory access hardware circuitry. Each memory access hardware circuitry is configured to receive, from the coupled processor, a memory access request specifying a coordinate of a multidimensional data structure, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to a respective one of the processors; and, in response to the memory access request, translate the coordinate of the multidimensional data structure into plural memory addresses for the multidimensional data structure and using the plural memory addresses, asynchronously transfer at least a portion of the multidimensional data structure for processing by at least the coupled processor. The memory locations may be in the shared memory of the coupled processor and/or an external memory.

6.

发明授权
Techniques for efficiently transferring data to a processor 有权

公开(公告)号：US11080051B2

公开(公告)日：2021-08-03

申请号：US16712083

申请日：2019-12-12

Applicant: NVIDIA Corporation

Inventor： Andrew Kerr , Jack Choquette , Xiaogang Qiu , Omkar Paranjape , Poornachandra Rao , Shirish Gadre , Steven J. Heinrich , Manan Patel , Olivier Giroux , Alan Kaatz

IPC: G06F9/30 , G06F12/0808 , G06F12/0888

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

7.

发明授权
Self-synchronizing remote memory operations in a multiprocessor system 有权

公开(公告)号：US12105960B2

公开(公告)日：2024-10-01

申请号：US17900808

申请日：2022-08-31

Applicant: NVIDIA CORPORATION

Inventor： Srinivas Santosh Kumar Madugula , Olivier Giroux , Wishwesh Anil Gandhi , Michael Allen Parker , Raghuram L , Ivan Tanasic , Manan Patel , Mark Hummel , Alexander L. Minkin

IPC: G06F3/06

CPC classification number: G06F3/0611 , G06F3/0659 , G06F3/0673

Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a multiprocessor computing system. During a remote memory operation in the multiprocessor computing system, a source processing unit transmits multiple segments of data to a destination processing. For each segment of data, the source processing unit transmits a remote memory operation to the destination processing unit that includes associated metadata that identifies the memory location of a corresponding synchronization object. The remote memory operation along with the metadata is transmitted as a single unit to the destination processing unit. The destination processing unit splits the operation into the remote memory operation and the memory synchronization operation. As a result, the source processing unit avoids the need to perform a separate memory synchronization operation, thereby reducing inter-processor communications and increasing performance of remote memory operations.

8.

发明授权
Error containment for enabling local checkpoint and recovery 有权

公开(公告)号：US11720440B2

公开(公告)日：2023-08-08

申请号：US17373678

申请日：2021-07-12

Applicant: NVIDIA CORPORATION

Inventor： Naveen Cherukuri , Saurabh Hukerikar , Paul Racunas , Nirmal Raj Saxena , David Charles Patrick , Yiyang Feng , Abhijeet Ghadge , Steven James Heinrich , Adam Hendrickson , Gentaro Hirota , Praveen Joginipally , Vaishali Kulkarni , Peter C. Mills , Sandeep Navada , Manan Patel , Liang Yin

IPC: G06F11/07 , G06F11/10 , G06F12/1018 , G06F11/14 , G06F12/1027

CPC classification number: G06F11/1016 , G06F11/0772 , G06F11/0793 , G06F11/1407 , G06F12/1018 , G06F12/1027

Abstract: Various embodiments include a parallel processing computer system that detects memory errors as a memory client loads data from memory and disables the memory client from storing data to memory, thereby reducing the likelihood that the memory error propagates to other memory clients. The memory client initiates a stall sequence, while other memory clients continue to execute instructions and the memory continues to service memory load and store operations. When a memory error is detected, a specific bit pattern is stored in conjunction with the data associated with the memory error. When the data is copied from one memory to another memory, the specific bit pattern is also copied, in order to identify the data as having a memory error.

9.

发明公开
TECHNIQUES FOR EFFICIENTLY TRANSFERRING DATA TO A PROCESSOR 审中-公开

公开(公告)号：US20230185570A1

公开(公告)日：2023-06-15

申请号：US18107374

申请日：2023-02-08

Applicant: NVIDIA Corporation

Inventor： Andrew KERR , Jack Choquette , Xiaogang Qiu , Omkar Paranjape , Poornachandra Rao , Shirish Gadre , Steven J. Heinrich , Manan Patel , Olivier Giroux , Alan Kaatz

IPC: G06F9/30 , G06F12/0808 , G06F12/0888 , G06F9/32 , G06F9/38 , G06F9/52 , G06F9/54

CPC classification number: G06F9/30043 , G06F12/0808 , G06F12/0888 , G06F9/3009 , G06F9/321 , G06F9/3871 , G06F9/522 , G06F9/542 , G06F9/544 , G06F9/546 , G06F9/3838 , G06F2212/621 , G06F9/3004

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

10.

发明授权
Techniques for efficiently operating a processing system based on energy characteristics of instructions and machine learning 有权

公开(公告)号：US11379708B2

公开(公告)日：2022-07-05

申请号：US16514078

申请日：2019-07-17

Applicant: NVIDIA Corporation

Inventor： Sachin Idgunji , Ming Y. Siu , Alex Gu , James Reilley , Manan Patel , Rajeshwaran Selvanesan , Ewa Kubalska

IPC: G06F9/30 , G06N3/04 , G06N3/08 , G06F9/38 , G06F1/3206

Abstract: An integrated circuit such as, for example a graphics processing unit (GPU), includes a dynamic power controller for adjusting operating voltage and/or frequency. The controller may receive current power used by the integrated circuit and a predicted power determined based on instructions pending in a plurality of processors. The controller determines adjustments that need to be made to the operating voltage and/or frequency to minimize the difference between the current power and the predicted power. An in-system reinforced learning mechanism is included to self-tune parameters of the controller.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification