Patent search ap:("NVIDIA CORPORATION") AND inv:"Xiaogang QIU" Page 1

1.

发明申请
HIGH PERFORMANCE SYNCHRONIZATION MECHANISMS FOR COORDINATING OPERATIONS ON A COMPUTER SYSTEM 有权

公开(公告)号：US20210124627A1

公开(公告)日：2021-04-29

申请号：US16712236

申请日：2019-12-12

Applicant: NVIDIA Corporation

Inventor： Olivier GIROUX , Jack CHOQUETTE , Ronny KRASHINSKY , Steve HEINRICH , Xiaogang QIU , Shirish GADRE

IPC: G06F9/52 , G06F9/54 , G06F9/30 , G06F9/38 , G06F9/32

Abstract: To synchronize operations of a computing system, a new type of synchronization barrier is disclosed. In one embodiment, the disclosed synchronization barrier provides for certain synchronization mechanisms such as, for example, “Arrive” and “Wait” to be split to allow for greater flexibility and efficiency in coordinating synchronization. In another embodiment, the disclosed synchronization barrier allows for hardware components such as, for example, dedicated copy or direct-memory-access (DMA) engines to be synchronized with software-based threads.

2.

发明申请
REORDERING BUFFER FOR MEMORY ACCESS LOCALITY 有权
Title translation: 用于存储访问本地的后台缓冲区

公开(公告)号：US20140164743A1

公开(公告)日：2014-06-12

申请号：US13710004

申请日：2012-12-10

Applicant: NVIDIA CORPORATION

Inventor： Olivier GIROUX , Jack Hilaire CHOQUETTE , Xiaogang QIU , Robert J. STOLL

IPC: G06F9/30

CPC classification number: G06F9/30145 , G06F9/30152 , G06F9/3808 , G06F9/3851 , G06F9/3855

Abstract: Systems and methods for scheduling instructions for execution on a multi-core processor reorder the execution of different threads to ensure that instructions specified as having localized memory access behavior are executed over one or more sequential clock cycles to benefit from memory access locality. At compile time, code sequences including memory access instructions that may be localized are delineated into separate batches. A scheduling unit ensures that multiple parallel threads are processed over one or more sequential scheduling cycles to execute the batched instructions. The scheduling unit waits to schedule execution of instructions that are not included in the particular batch until execution of the batched instructions is done so that memory access locality is maintained for the particular batch. In between the separate batches, instructions that are not included in a batch are scheduled so that threads executing non-batched instructions are also processed and not starved.

Abstract translation: 用于调度用于在多核处理器上执行的指令的系统和方法重新排序不同线程的执行，以确保指定为具有局部存储器访问行为的指令在一个或多个顺序时钟周期上执行以受益于存储器访问位置。在编译时，包括可能本地化的存储器访问指令的代码序列被划分为单独的批处理。调度单元确保通过一个或多个顺序调度周期处理多个并行线程以执行批量指令。调度单元等待执行不包括在特定批中的指令，直到完成批处理指令的执行，以便为特定批次维护存储器访问位置。在单独批次之间，调度不包含在批处理中的指令，以便执行非批处理指令的线程也被处理并且不会被饿死。

3.

发明申请
TECHNIQUES FOR EFFICIENTLY TRANSFERRING DATA TO A PROCESSOR 有权

公开(公告)号：US20210326137A1

公开(公告)日：2021-10-21

申请号：US17363561

申请日：2021-06-30

Applicant: NVIDIA Corporation

Inventor： Andrew KERR , Jack CHOQUETTE , Xiaogang QIU , Omkar PARANJAPE , Poornachandra RAO , Shirish GADRE , Steven J. HEINRICH , Manan PATEL , Olivier GIROUX , Alan KAATZ

IPC: G06F9/30 , G06F12/0808 , G06F12/0888 , G06F9/32 , G06F9/38 , G06F9/52 , G06F9/54

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

4.

发明申请
UNIFIED CACHE FOR DIVERSE MEMORY TRAFFIC 审中-公开

公开(公告)号：US20180322078A1

公开(公告)日：2018-11-08

申请号：US15716461

申请日：2017-09-26

Applicant: NVIDIA Corporation

Inventor： Xiaogang QIU , Ronny KRASHINSKY , Steven HEINRICH , Shirish GADRE , John EDMONDSON , Jack CHOQUETTE , Mark GEBHART , Ramesh JANDHYALA , Poornachandra RAO , Omkar PARANJAPE , Michael SIU

IPC: G06F13/28 , G06F12/0811 , G06F12/0891 , G06F12/084

Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.

5.

发明申请
UNIFIED CACHE FOR DIVERSE MEMORY TRAFFIC 审中-公开

公开(公告)号：US20180322077A1

公开(公告)日：2018-11-08

申请号：US15587213

申请日：2017-05-04

Applicant: NVIDIA Corporation

Inventor： Xiaogang QIU , Ronny KRASHINSKY , Steven HEINRICH , Shirish GADRE , John EDMONDSON , Jack CHOQUETTE , Mark GEBHART , Ramesh JANDHYALA , Poornachandra RAO , Omkar PARANJAPE , Michael SIU

IPC: G06F13/28 , G06F12/0891 , G06F12/0811 , G06F12/084

Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.

6.

发明申请
HIERARCHICAL STAGING AREAS FOR SCHEDULING THREADS FOR EXECUTION 审中-公开
Title translation: 用于调度执行螺纹的分级分区

公开(公告)号：US20150113538A1

公开(公告)日：2015-04-23

申请号：US14061170

申请日：2013-10-23

Applicant: NVIDIA CORPORATION

Inventor： Olivier GIROUX , Jack Hilaire CHOQUETTE , Robert J. STOLL , Xiaogang QIU , Michael Alan FETTERMAN

IPC: G06F9/50

CPC classification number: G06F9/5011 , G06F2209/507

Abstract: One embodiment of the present invention is a computer-implemented method for scheduling a thread group for execution on a processing engine that includes identifying a first thread group included in a first set of thread groups that can be issued for execution on the processing engine, where the first thread group includes one or more threads. The method also includes transferring the first thread group from the first set of thread groups to a second set of thread groups, allocating hardware resources to the first thread group, and selecting the first thread group from the second set of thread groups for execution on the processing engine. One advantage of the disclosed technique is that a scheduler only allocates limited hardware resources to thread groups that are, in fact, ready to be issued for execution, thereby conserving those resources in a manner that is generally more efficient than conventional techniques.

Abstract translation: 本发明的一个实施例是一种用于在处理引擎上调度用于执行的线程组的计算机实现的方法，该处理引擎包括识别包括在可被发行用于在处理引擎上执行的第一组线程组中的第一线程组，其中第一个线程组包括一个或多个线程。该方法还包括将第一线程组从第一组线程组传送到第二组线程组，向第一线程组分配硬件资源，以及从第二组线程组中选择第一线程组以在处理引擎。所公开技术的一个优点是调度器仅将有限的硬件资源分配给事实上准备被发行用于执行的线程组，从而以通常比常规技术更有效的方式来保存那些资源。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification