Patent search ap:("NVIDIA CORPORATION") AND inv:"Gregory Frederick Diamos" Page 1

1.

发明授权
System, method, and computer program product for managing divergences and synchronization points during thread block execution by using a double sided queue for token storage 有权
Title translation: 系统，方法和计算机程序产品，用于通过使用用于令牌存储的双面队列来管理线程块执行期间的分歧和同步点

公开(公告)号：US09459876B2

公开(公告)日：2016-10-04

申请号：US13945842

申请日：2013-07-18

Applicant: NVIDIA Corporation

Inventor： Olivier Giroux , Gregory Frederick Diamos

IPC: G06F9/38 , G06F9/52 , G06F9/30

CPC classification number: G06F9/38 , G06F9/30087 , G06F9/3009 , G06F9/3851 , G06F9/524

Abstract: A system, method, and computer program product for ensuring forward progress of threads that implement divergent operations in a single-instruction, multiple data (SIMD) architecture is disclosed. The method includes the steps of allocating a queue data structure to a thread block including a plurality of threads, determining that a current instruction specifies a yield operation, pushing a token onto the second side of the queue data structure, disabling any active threads in the thread block, popping a next pending token from the first side of the queue data structure, and activating one or more threads in the thread block according to a mask included in the next pending token.

Abstract translation: 公开了一种用于确保在单指令多数据（SIMD）架构中实现发散操作的线程向前进展的系统，方法和计算机程序产品。该方法包括以下步骤：将队列数据结构分配给包括多个线程的线程块，确定当前指令指定收益率操作，将令牌推送到队列数据结构的第二侧，禁止在该队列数据结构中的任何活动线程线程块，从队列数据结构的第一侧弹出下一个挂起的令牌，以及根据包括在下一个未决令牌中的掩码激活线程块中的一个或多个线程。

2.

发明申请
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR COOPERATIVE MULTI-THREADING FOR VECTOR THREADS 有权
Title translation: 用于矢量螺纹合成多线程的系统，方法和计算机程序产品

公开(公告)号：US20150026438A1

公开(公告)日：2015-01-22

申请号：US13945842

申请日：2013-07-18

Applicant: Nvidia Corporation

Inventor： Olivier Giroux , Gregory Frederick Diamos

IPC: G06F9/38

CPC classification number: G06F9/38 , G06F9/30087 , G06F9/3009 , G06F9/3851 , G06F9/524

Abstract: A system, method, and computer program product for ensuring forward progress of threads that implement divergent operations in a single-instruction, multiple data (SIMD) architecture is disclosed. The method includes the steps of allocating a queue data structure to a thread block including a plurality of threads, determining that a current instruction specifies a yield operation, pushing a token onto the second side of the queue data structure, disabling any active threads in the thread block, popping a next pending token from the first side of the queue data structure, and activating one or more threads in the thread block according to a mask included in the next pending token.

Abstract translation: 公开了一种用于确保在单指令多数据（SIMD）架构中实现发散操作的线程向前进展的系统，方法和计算机程序产品。该方法包括以下步骤：将队列数据结构分配给包括多个线程的线程块，确定当前指令指定收益率操作，将令牌推送到队列数据结构的第二侧，禁止在该队列数据结构中的任何活动线程线程块，从队列数据结构的第一侧弹出下一个挂起的令牌，以及根据包括在下一个未决令牌中的掩码激活线程块中的一个或多个线程。

3.

发明授权
System, method, and computer program product for bulk synchronous binary program translation and optimization 有权
Title translation: 用于批量同步二进制程序转换和优化的系统，方法和计算机程序产品

公开(公告)号：US09207919B2

公开(公告)日：2015-12-08

申请号：US14158749

申请日：2014-01-17

Applicant: NVIDIA Corporation

Inventor： Gregory Frederick Diamos

IPC: G06F9/45 , G06F9/30

CPC classification number: G06F8/41 , G06F9/30087 , G06F9/30181 , G06F9/45516

Abstract: A system, method, and computer program product are provided for. The method includes the steps of executing a block of translated binary instructions by multiple threads and gathering profiling data during execution of the block of translated binary instructions. The multiple threads are then synchronized at a barrier instruction associated with the block of translated binary instructions and the block of translated binary instructions is replaced with optimized binary instructions, where the optimized binary instructions are produced based on the profiling data.

Abstract translation: 提供了一种系统，方法和计算机程序产品。该方法包括以下步骤：通过多个线程执行翻译的二进制指令块，并且在执行翻译的二进制指令块期间收集分析数据。然后，多个线程在与翻译的二进制指令块相关联的障碍指令处同步，并且转换的二进制指令块被替换为优化的二进制指令，其中基于分析数据产生优化的二进制指令。

4.

发明申请
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR BULK SYNCHRONOUS BINARY PROGRAM TRANSLATION AND OPTIMIZATION 有权
Title translation: 用于大容量同步二进制程序翻译和优化的系统，方法和计算机程序产品

公开(公告)号：US20150205586A1

公开(公告)日：2015-07-23

申请号：US14158749

申请日：2014-01-17

Applicant: NVIDIA Corporation

Inventor： Gregory Frederick Diamos

IPC: G06F9/45 , G06F9/30

CPC classification number: G06F8/41 , G06F9/30087 , G06F9/30181 , G06F9/45516

Abstract: A system, method, and computer program product are provided for. The method includes the steps of executing a block of translated binary instructions by multiple threads and gathering profiling data during execution of the block of translated binary instructions. The multiple threads are then synchronized at a barrier instruction associated with the block of translated binary instructions and the block of translated binary instructions is replaced with optimized binary instructions, where the optimized binary instructions are produced based on the profiling data.

Abstract translation: 提供了一种系统，方法和计算机程序产品。该方法包括以下步骤：通过多个线程执行翻译的二进制指令块，并且在执行翻译的二进制指令块期间收集分析数据。然后，多个线程在与翻译的二进制指令块相关联的障碍指令处同步，并且转换的二进制指令块被替换为优化的二进制指令，其中基于分析数据产生优化的二进制指令。

5.

发明授权
Execution of divergent threads using a convergence barrier 有权

公开(公告)号：US10067768B2

公开(公告)日：2018-09-04

申请号：US14798265

申请日：2015-07-13

Applicant: NVIDIA Corporation

Inventor： Gregory Frederick Diamos , Richard Craig Johnson , Vinod Grover , Olivier Giroux , Jack H. Choquette , Michael Alan Fetterman , Ajay S. Tirumala , Peter Nelson , Ronny Meir Krashinsky

IPC: G06F9/38 , G06F9/52 , G06F9/30

Abstract: A method, system, and computer program product for executing divergent threads using a convergence barrier are disclosed. A first instruction in a program is executed by a plurality of threads, where the first instruction, when executed by a particular thread, indicates to a scheduler unit that the thread participates in a convergence barrier. A first path through the program is executed by a first divergent portion of the participating threads and a second path through the program is executed by a second divergent portion of the participating threads. The first divergent portion of the participating threads executes a second instruction in the program and transitions to a blocked state at the convergence barrier. The scheduler unit determines that all of the participating threads are synchronized at the convergence barrier and the convergence barrier is cleared.

6.

发明申请
EXECUTION OF DIVERGENT THREADS USING A CONVERGENCE BARRIER 审中-公开
Title translation: 使用综合障碍物执行多余的螺旋线

公开(公告)号：US20160019066A1

公开(公告)日：2016-01-21

申请号：US14798265

申请日：2015-07-13

Applicant: NVIDIA CORPORATION

Inventor： Gregory Frederick Diamos , Richard Craig Johnson , Vinod Grover , Olivier Giroux , Jack H. Choquette , Michael Alan Fetterman , Ajay S. Tirumala , Peter Nelson , Ronny Meir Krashinsky

IPC: G06F9/38 , G06F9/30

CPC classification number: G06F9/522 , G06F9/30087 , G06F9/3009 , G06F9/3851 , G06F9/3887

Abstract: A method, system, and computer program product for executing divergent threads using a convergence barrier are disclosed. A first instruction in a program is executed by a plurality of threads, where the first instruction, when executed by a particular thread, indicates to a scheduler unit that the thread participates in a convergence barrier. A first path through the program is executed by a first divergent portion of the participating threads and a second path through the program is executed by a second divergent portion of the participating threads. The first divergent portion of the participating threads executes a second instruction in the program and transitions to a blocked state at the convergence barrier. The scheduler unit determines that all of the participating threads are synchronized at the convergence barrier and the convergence barrier is cleared.

Abstract translation: 公开了一种使用会聚障碍来执行发散线程的方法，系统和计算机程序产品。程序中的第一指令由多个线程执行，其中当特定线程执行时，第一指令向调度器单元指示线程参与会聚障碍。通过程序的第一路径由参与线程的第一发散部分执行，并且通过程序的第二路径由参与线程的第二发散部分执行。参与线程的第一发散部分执行程序中的第二条指令，并在会聚障碍处转变为阻塞状态。调度器单元确定所有参与线程在会聚障碍处被同步，并且会聚障碍被清除。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification