专利检索 ap:"Olivier GIROUX" 第 1 页

1.

发明授权
Pre-scheduled replays of divergent operations 有权

公开(公告)号：US10152329B2

公开(公告)日：2018-12-11

申请号：US13370173

申请日：2012-02-09

申请人： Michael Fetterman , Stewart Glenn Carlton , Jack Hilaire Choquette , Shirish Gadre , Olivier Giroux , Douglas J. Hahn , Steven James Heinrich , Eric Lyell Hill , Charles McCarver , Omkar Paranjape , Anjana Rajendran , Rajeshwaran Selvanesan

发明人： Michael Fetterman , Stewart Glenn Carlton , Jack Hilaire Choquette , Shirish Gadre , Olivier Giroux , Douglas J. Hahn , Steven James Heinrich , Eric Lyell Hill , Charles McCarver , Omkar Paranjape , Anjana Rajendran , Rajeshwaran Selvanesan

IPC分类号： G06F9/38

摘要： One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced. One advantage of the disclosed technique is that divergent operations requiring one or more replay operations execute with reduced latency.

2.

发明授权
Method and system for resolving thread divergences 有权

公开(公告)号：US09606808B2

公开(公告)日：2017-03-28

申请号：US13348544

申请日：2012-01-11

申请人： Jack Choquette , Xiaogang Qiu , Jeff Tuckey , Michael (Ming Yiu) Siu , Robert J. Stoll , Olivier Giroux

发明人： Jack Choquette , Xiaogang Qiu , Jeff Tuckey , Michael (Ming Yiu) Siu , Robert J. Stoll , Olivier Giroux

IPC分类号： G06F9/38

CPC分类号： G06F9/3887 , G06F9/3851

摘要： A computing device detects divergences between threads in a thread group executing on a parallel processing unit. The computing device includes an address divergence unit that identifies a subset of non-divergent threads included in the thread group. The address divergence unit stores instructions related to the subset of non-divergent threads in a multi-issue queue. The address divergence unit causes the instructions related to the subset of non-divergent threads to be retrieved from the multi-issue queue when the parallel processing unit is available. The address divergence unit causes the subset of non-divergent threads to be issued for execution on the parallel processing unit. The address divergence unit repeats the identifying, storing, and causing steps for the remaining threads in the thread group that are not included in the subset of non-divergent threads.

3.

发明申请
RELAXED COHERENCY BETWEEN DIFFERENT CACHES 有权
标题翻译：不同速度之间的放松的相似性

公开(公告)号：US20140025891A1

公开(公告)日：2014-01-23

申请号：US13555048

申请日：2012-07-20

申请人： Joel James MCCORMACK , Rajesh KOTA , Olivier GIROUX , Emmett M. KILGARIFF

发明人： Joel James MCCORMACK , Rajesh KOTA , Olivier GIROUX , Emmett M. KILGARIFF

IPC分类号： G06F12/08

CPC分类号： G06F12/0837 , G06F12/0815

摘要： One embodiment sets forth a technique for ensuring relaxed coherency between different caches. Two different execution units may be configured to access different caches that may store one or more cache lines corresponding to the same memory address. During time periods between memory barrier instructions relaxed coherency is maintained between the different caches. More specifically, writes to a cache line in a first cache that corresponds to a particular memory address are not necessarily propagated to a cache line in a second cache before the second cache receives a read or write request that also corresponds to the particular memory address. Therefore, the first cache and the second are not necessarily coherent during time periods of relaxed coherency. Execution of a memory barrier instruction ensures that the different caches will be coherent before a new period of relaxed coherency begins.

摘要翻译： 一个实施例提出了一种确保不同缓存之间的轻松一致性的技术。可以将两个不同的执行单元配置为访问可以存储对应于相同存储器地址的一个或多个高速缓存行的不同高速缓存。在存储器屏障指令之间的时间段期间，在不同的高速缓存之间保持轻松的一致性。更具体地，在第二高速缓存接收到也对应于特定存储器地址的读取或写入请求之前，对与特定存储器地址相对应的第一高速缓存中的高速缓存行的写入不一定被传播到第二高速缓存中的高速缓存行。因此，第一缓存和第二缓存在松弛一致性的时间段期间不一定是相干的。存储器屏障指令的执行确保在新的松弛一致性周期开始之前，不同的高速缓存将是相干的。

4.

发明申请
METHODS AND APPARATUS TO AVOID SURGES IN DI/DT BY THROTTLING GPU EXECUTION PERFORMANCE 有权
标题翻译：通过GPU执行性能避免DI / DT中的采样的方法和设备

公开(公告)号：US20130262831A1

公开(公告)日：2013-10-03

申请号：US13437765

申请日：2012-04-02

申请人： Peter Michael NELSON , Jack Hilaire Choquette , Olivier Giroux

发明人： Peter Michael NELSON , Jack Hilaire Choquette , Olivier Giroux

IPC分类号： G06F9/30

CPC分类号： G06F9/3836 , G06F1/26 , G06F1/305 , G06F1/3203 , G06F1/324 , G06F1/329 , G06F9/30 , G06F9/30109 , G06F9/3851 , G06F9/3887 , G06T1/20 , Y02D10/24

摘要： Systems and methods for throttling GPU execution performance to avoid surges in DI/DT. A processor includes one or more execution units coupled to a scheduling unit configured to select instructions for execution by the one or more execution units. The execution units may be connected to one or more decoupling capacitors that store power for the circuits of the execution units. The scheduling unit is configured to throttle the instruction issue rate of the execution units based on a moving average issue rate over a large number of scheduling periods. The number of instructions issued during the current scheduling period is less than or equal to a throttling rate maintained by the scheduling unit that is greater than or equal to a minimum throttling issue rate. The throttling rate is set equal to the moving average plus an offset value at the end of each scheduling period.

摘要翻译： 节省GPU执行性能的系统和方法，以避免DI / DT中的浪涌。处理器包括耦合到调度单元的一个或多个执行单元，调度单元被配置为选择用于由一个或多个执行单元执行的指令。执行单元可以连接到一个或多个存储执行单元的电路的去耦电容器。调度单元被配置为基于在大量调度周期上的移动平均发布速率来抑制执行单元的指令发布速率。在当前调度周期内发出的指令数小于或等于由调度单元维持的大于或等于最小节流发布率的节流速率。节流速度设置为等于每个调度周期结束时的移动平均加上偏移值。

5.

发明申请
METHODS AND APPARATUS FOR SCHEDULING INSTRUCTIONS USING PRE-DECODE DATA 有权
标题翻译：使用预编码数据调度指令的方法和装置

公开(公告)号：US20130166881A1

公开(公告)日：2013-06-27

申请号：US13333879

申请日：2011-12-21

申请人： Jack Hilaire CHOQUETTE , Robert J. Stoll , Olivier Giroux

发明人： Jack Hilaire CHOQUETTE , Robert J. Stoll , Olivier Giroux

IPC分类号： G06F9/30 , G06F9/312

CPC分类号： G06F9/3851 , G06F9/3802 , G06F9/382

摘要： Systems and methods for scheduling instructions using pre-decode data corresponding to each instruction. In one embodiment, a multi-core processor includes a scheduling unit in each core for selecting instructions from two or more threads each scheduling cycle for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The pre-decode data is determined by a compiler and is extracted by the scheduling unit during runtime and used to control selection of threads for execution. The pre-decode data may specify a number of scheduling cycles to wait before scheduling the instruction. The pre-decode data may also specify a scheduling priority for the instruction. Once the scheduling unit selects an instruction to issue for execution, a decode unit fully decodes the instruction.

摘要翻译： 用于使用对应于每个指令的预解码数据调度指令的系统和方法。在一个实施例中，多核处理器包括每个核心中的调度单元，用于从两个或更多个线程中选择用于在该特定核心上执行的调度周期的指令。由于线程被安排在核心上执行，所以来自线程的指令被取入到缓冲器中而不被解码。预解码数据由编译器确定，并且在运行时由调度单元提取并用于控制用于执行的线程的选择。预解码数据可以指定在调度指令之前等待的多个调度周期。预解码数据还可以指定该指令的调度优先级。一旦调度单元选择要执行的指令，则解码单元完全解码该指令。

6.

发明授权
Speculative execution and rollback 有权

公开(公告)号：US09830158B2

公开(公告)日：2017-11-28

申请号：US13289643

申请日：2011-11-04

申请人： Jack Hilaire Choquette , Olivier Giroux , Robert J. Stoll , Xiaogang Qiu

发明人： Jack Hilaire Choquette , Olivier Giroux , Robert J. Stoll , Xiaogang Qiu

IPC分类号： G06F9/38

CPC分类号： G06F9/3842 , G06F9/3851 , G06F9/3861 , G06F9/3887

摘要： One embodiment of the present invention sets forth a technique for speculatively issuing instructions to allow a processing pipeline to continue to process some instructions during rollback of other instructions. A scheduler circuit issues instructions for execution assuming that, several cycles later, when the instructions reach multithreaded execution units, that dependencies between the instructions will be resolved, resources will be available, operand data will be available, and other conditions will not prevent execution of the instructions. When a rollback condition exists at the point of execution for an instruction for a particular thread group, the instruction is not dispatched to the multithreaded execution units. However, other instructions issued by the scheduler circuit for execution by different thread groups, and for which a rollback condition does not exist, are executed by the multithreaded execution units. The instruction incurring the rollback condition is reissued after the rollback condition no longer exists.

7.

发明授权
Relaxed coherency between different caches 有权
标题翻译：不同缓存之间轻松的一致性

公开(公告)号：US08930636B2

公开(公告)日：2015-01-06

申请号：US13555048

申请日：2012-07-20

申请人： Joel James McCormack , Rajesh Kota , Olivier Giroux , Emmett M. Kilgariff

发明人： Joel James McCormack , Rajesh Kota , Olivier Giroux , Emmett M. Kilgariff

IPC分类号： G06F12/08

CPC分类号： G06F12/0837 , G06F12/0815

摘要： One embodiment sets forth a technique for ensuring relaxed coherency between different caches. Two different execution units may be configured to access different caches that may store one or more cache lines corresponding to the same memory address. During time periods between memory barrier instructions relaxed coherency is maintained between the different caches. More specifically, writes to a cache line in a first cache that corresponds to a particular memory address are not necessarily propagated to a cache line in a second cache before the second cache receives a read or write request that also corresponds to the particular memory address. Therefore, the first cache and the second are not necessarily coherent during time periods of relaxed coherency. Execution of a memory barrier instruction ensures that the different caches will be coherent before a new period of relaxed coherency begins.

摘要翻译： 一个实施例提出了一种确保不同缓存之间的轻松一致性的技术。可以将两个不同的执行单元配置为访问可以存储对应于相同存储器地址的一个或多个高速缓存行的不同高速缓存。在存储器屏障指令之间的时间段期间，在不同的高速缓存之间保持轻松的一致性。更具体地，在第二高速缓存接收到也对应于特定存储器地址的读取或写入请求之前，对与特定存储器地址相对应的第一高速缓存中的高速缓存行的写入不一定被传播到第二高速缓存中的高速缓存行。因此，第一缓存和第二缓存在松弛一致性的时间段期间不一定是相干的。存储器障碍指令的执行确保在新的松弛一致性周期开始之前，不同的高速缓存将是相干的。

8.

发明申请
SPECULATIVE EXECUTION AND ROLLBACK 有权
标题翻译：统一执行和滚动

公开(公告)号：US20130117541A1

公开(公告)日：2013-05-09

申请号：US13289643

申请日：2011-11-04

申请人： Jack Hilaire CHOQUETTE , Olivier Giroux , Robert J. Stoll , Xiaogang Qiu

发明人： Jack Hilaire CHOQUETTE , Olivier Giroux , Robert J. Stoll , Xiaogang Qiu

IPC分类号： G06F9/30

CPC分类号： G06F9/3842 , G06F9/3851 , G06F9/3861 , G06F9/3887

摘要： One embodiment of the present invention sets forth a technique for speculatively issuing instructions to allow a processing pipeline to continue to process some instructions during rollback of other instructions. A scheduler circuit issues instructions for execution assuming that, several cycles later, when the instructions reach multithreaded execution units, that dependencies between the instructions will be resolved, resources will be available, operand data will be available, and other conditions will not prevent execution of the instructions. When a rollback condition exists at the point of execution for an instruction for a particular thread group, the instruction is not dispatched to the multithreaded execution units. However, other instructions issued by the scheduler circuit for execution by different thread groups, and for which a rollback condition does not exist, are executed by the multithreaded execution units. The instruction incurring the rollback condition is reissued after the rollback condition no longer exists.

摘要翻译： 本发明的一个实施例提出了一种用于推测发出指令以允许处理流水线在其他指令的回滚期间继续处理一些指令的技术。调度器电路发出执行指令，假设几个周期后，当指令到达多线程执行单元时，指令之间的相关性将被解决，资源将可用，操作数数据将可用，而其他条件将不会阻止执行说明。当在特定线程组的指令的执行点处存在回滚条件时，指令不会分派给多线程执行单元。然而，由多线程执行单元执行由调度器电路发出的用于由不同线程组执行并且不存在回滚条件的其他指令。在回滚条件不再存在之后，重新发出导致回滚条件的指令。

9.

发明授权
Methods and apparatus for scheduling instructions using pre-decode data 有权

公开(公告)号：US09798548B2

公开(公告)日：2017-10-24

申请号：US13333879

申请日：2011-12-21

申请人： Jack Hilaire Choquette , Robert J. Stoll , Olivier Giroux

发明人： Jack Hilaire Choquette , Robert J. Stoll , Olivier Giroux

IPC分类号： G06F15/00 , G06F9/30 , G06F9/40 , G06F9/38

CPC分类号： G06F9/3851 , G06F9/3802 , G06F9/382

摘要： Systems and methods for scheduling instructions using pre-decode data corresponding to each instruction. In one embodiment, a multi-core processor includes a scheduling unit in each core for selecting instructions from two or more threads each scheduling cycle for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The pre-decode data is determined by a compiler and is extracted by the scheduling unit during runtime and used to control selection of threads for execution. The pre-decode data may specify a number of scheduling cycles to wait before scheduling the instruction. The pre-decode data may also specify a scheduling priority for the instruction. Once the scheduling unit selects an instruction to issue for execution, a decode unit fully decodes the instruction.

10.

发明授权
Method and system for memory overlays for portable function pointers 有权
标题翻译：用于便携式功能指针的内存覆盖的方法和系统

公开(公告)号：US09405561B2

公开(公告)日：2016-08-02

申请号：US13570155

申请日：2012-08-08

申请人： Olivier Giroux

发明人： Olivier Giroux

IPC分类号： G06F9/30 , G06F9/38 , G06F9/445 , G06F9/44

CPC分类号： G06F9/44547 , G06F9/449 , G06F2209/463

摘要： A system and method for implementing memory overlays for portable pointer variables. The method includes providing a program executable by a heterogeneous processing system comprising a plurality of a processors running a plurality of instruction set architectures (ISAs). The method also includes providing a plurality of processor specific functions associated with a function pointer in the program. The method includes executing the program by a first processor. The method includes dereferencing the function pointer by mapping the function pointer to a corresponding processor specific feature based on which processor in the plurality of processors is executing the program.

摘要翻译： 一种用于实现便携式指针变量的存储器覆盖的系统和方法。该方法包括提供可由包括运行多个指令集体系结构（ISAs）的多个处理器的异构处理系统执行的程序。该方法还包括提供与程序中的功能指针相关联的多个处理器特定功能。该方法包括由第一处理器执行该程序。该方法包括通过基于多个处理器中的哪个处理器正在执行程序将功能指针映射到相应的处理器特定特征来解除引用功能指针。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类