专利检索 ap:("Peter Michael NELSON" OR "Jack Hilaire Choquette" OR "Olivier Giroux") AND inv:"Jack Hilaire Choquette" 第 1 页

1.

发明申请
METHODS AND APPARATUS TO AVOID SURGES IN DI/DT BY THROTTLING GPU EXECUTION PERFORMANCE 有权
标题翻译：通过GPU执行性能避免DI / DT中的采样的方法和设备

公开(公告)号：US20130262831A1

公开(公告)日：2013-10-03

申请号：US13437765

申请日：2012-04-02

申请人： Peter Michael NELSON , Jack Hilaire Choquette , Olivier Giroux

发明人： Peter Michael NELSON , Jack Hilaire Choquette , Olivier Giroux

IPC分类号： G06F9/30

CPC分类号： G06F9/3836 , G06F1/26 , G06F1/305 , G06F1/3203 , G06F1/324 , G06F1/329 , G06F9/30 , G06F9/30109 , G06F9/3851 , G06F9/3887 , G06T1/20 , Y02D10/24

摘要： Systems and methods for throttling GPU execution performance to avoid surges in DI/DT. A processor includes one or more execution units coupled to a scheduling unit configured to select instructions for execution by the one or more execution units. The execution units may be connected to one or more decoupling capacitors that store power for the circuits of the execution units. The scheduling unit is configured to throttle the instruction issue rate of the execution units based on a moving average issue rate over a large number of scheduling periods. The number of instructions issued during the current scheduling period is less than or equal to a throttling rate maintained by the scheduling unit that is greater than or equal to a minimum throttling issue rate. The throttling rate is set equal to the moving average plus an offset value at the end of each scheduling period.

摘要翻译： 节省GPU执行性能的系统和方法，以避免DI / DT中的浪涌。处理器包括耦合到调度单元的一个或多个执行单元，调度单元被配置为选择用于由一个或多个执行单元执行的指令。执行单元可以连接到一个或多个存储执行单元的电路的去耦电容器。调度单元被配置为基于在大量调度周期上的移动平均发布速率来抑制执行单元的指令发布速率。在当前调度周期内发出的指令数小于或等于由调度单元维持的大于或等于最小节流发布率的节流速率。节流速度设置为等于每个调度周期结束时的移动平均加上偏移值。

2.

发明授权
Throttling instruction issue rate based on updated moving average to avoid surges in DI/DT 有权
标题翻译：基于更新移动平均线的限制指令发布率，以避免DI / DT中的激增

公开(公告)号：US09430242B2

公开(公告)日：2016-08-30

申请号：US13437765

申请日：2012-04-02

申请人： Peter Michael Nelson , Jack Hilaire Choquette , Olivier Giroux

发明人： Peter Michael Nelson , Jack Hilaire Choquette , Olivier Giroux

IPC分类号： G06F9/30 , G06F9/38 , G06F1/32 , G06T1/20 , G06F1/26 , G06F1/30

CPC分类号： G06F9/3836 , G06F1/26 , G06F1/305 , G06F1/3203 , G06F1/324 , G06F1/329 , G06F9/30 , G06F9/30109 , G06F9/3851 , G06F9/3887 , G06T1/20 , Y02D10/24

摘要： Systems and methods for throttling GPU execution performance to avoid surges in DI/DT. A processor includes one or more execution units coupled to a scheduling unit configured to select instructions for execution by the one or more execution units. The execution units may be connected to one or more decoupling capacitors that store power for the circuits of the execution units. The scheduling unit is configured to throttle the instruction issue rate of the execution units based on a moving average issue rate over a large number of scheduling periods. The number of instructions issued during the current scheduling period is less than or equal to a throttling rate maintained by the scheduling unit that is greater than or equal to a minimum throttling issue rate. The throttling rate is set equal to the moving average plus an offset value at the end of each scheduling period.

摘要翻译： 节省GPU执行性能的系统和方法，以避免DI / DT中的浪涌。处理器包括耦合到调度单元的一个或多个执行单元，调度单元被配置为选择用于由一个或多个执行单元执行的指令。执行单元可以连接到一个或多个存储执行单元的电路的去耦电容器。调度单元被配置为基于在大量调度周期上的移动平均发布速率来抑制执行单元的指令发布速率。在当前调度周期内发出的指令数小于或等于由调度单元维持的大于或等于最小节流发布率的节流速率。节流速度设置为等于每个调度周期结束时的移动平均加上偏移值。

3.

发明授权
Methods and apparatus for scheduling instructions using pre-decode data 有权

公开(公告)号：US09798548B2

公开(公告)日：2017-10-24

申请号：US13333879

申请日：2011-12-21

申请人： Jack Hilaire Choquette , Robert J. Stoll , Olivier Giroux

发明人： Jack Hilaire Choquette , Robert J. Stoll , Olivier Giroux

IPC分类号： G06F15/00 , G06F9/30 , G06F9/40 , G06F9/38

CPC分类号： G06F9/3851 , G06F9/3802 , G06F9/382

摘要： Systems and methods for scheduling instructions using pre-decode data corresponding to each instruction. In one embodiment, a multi-core processor includes a scheduling unit in each core for selecting instructions from two or more threads each scheduling cycle for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The pre-decode data is determined by a compiler and is extracted by the scheduling unit during runtime and used to control selection of threads for execution. The pre-decode data may specify a number of scheduling cycles to wait before scheduling the instruction. The pre-decode data may also specify a scheduling priority for the instruction. Once the scheduling unit selects an instruction to issue for execution, a decode unit fully decodes the instruction.

4.

发明申请
PRE-SCHEDULED REPLAYS OF DIVERGENT OPERATIONS 审中-公开
标题翻译：预先安排的重复操作

公开(公告)号：US20130212364A1

公开(公告)日：2013-08-15

申请号：US13370173

申请日：2012-02-09

申请人： Michael FETTERMAN , Stewart Glenn Carlton , Jack Hilaire Choquette , Shirish Gadre , Olivier Giroux , Douglas J. Hahn , Steven James Heinrich , Eric Lyell Hill , Charles McCarver , Omkar Paranjape , Anjana Rajendran , Rajeshwaran Selvanesan

发明人： Michael FETTERMAN , Stewart Glenn Carlton , Jack Hilaire Choquette , Shirish Gadre , Olivier Giroux , Douglas J. Hahn , Steven James Heinrich , Eric Lyell Hill , Charles McCarver , Omkar Paranjape , Anjana Rajendran , Rajeshwaran Selvanesan

IPC分类号： G06F9/38 , G06F9/312

CPC分类号： G06F9/3861 , G06F9/3836 , G06F9/3851 , G06F9/3887

摘要： One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced. One advantage of the disclosed technique is that divergent operations requiring one or more replay operations execute with reduced latency.

摘要翻译： 本公开的一个实施例阐述了在并行处理子系统中执行用于发散操作的预先安排的重播操作的优化方式。具体地，流式多处理器（SM）包括多级流水线，其被配置为将预先安排的重播操作插入到多级流水线中。预先安排的重播单元检测与当前指令相关联的操作是否正在访问公共资源。如果线程正在访问分布在多个高速缓存线上的数据，则预先安排的重播单元在当前指令后面插入预先安排的重放操作。多级流水线顺序执行指令和相关的预先安排的重播操作。如果附加线程在执行指令和预先安排的重放操作之后保持未被接受，则通过重放循环插入附加的重放操作，直到对所有线程进行服务。所公开技术的一个优点是需要一个或多个重放操作的发散操作以较低的等待时间执行。

5.

发明授权
Pre-scheduled replays of divergent operations 有权

公开(公告)号：US10152329B2

公开(公告)日：2018-12-11

申请号：US13370173

申请日：2012-02-09

申请人： Michael Fetterman , Stewart Glenn Carlton , Jack Hilaire Choquette , Shirish Gadre , Olivier Giroux , Douglas J. Hahn , Steven James Heinrich , Eric Lyell Hill , Charles McCarver , Omkar Paranjape , Anjana Rajendran , Rajeshwaran Selvanesan

发明人： Michael Fetterman , Stewart Glenn Carlton , Jack Hilaire Choquette , Shirish Gadre , Olivier Giroux , Douglas J. Hahn , Steven James Heinrich , Eric Lyell Hill , Charles McCarver , Omkar Paranjape , Anjana Rajendran , Rajeshwaran Selvanesan

IPC分类号： G06F9/38

摘要： One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced. One advantage of the disclosed technique is that divergent operations requiring one or more replay operations execute with reduced latency.

6.

发明授权
Speculative execution and rollback 有权

公开(公告)号：US09830158B2

公开(公告)日：2017-11-28

申请号：US13289643

申请日：2011-11-04

申请人： Jack Hilaire Choquette , Olivier Giroux , Robert J. Stoll , Xiaogang Qiu

发明人： Jack Hilaire Choquette , Olivier Giroux , Robert J. Stoll , Xiaogang Qiu

IPC分类号： G06F9/38

CPC分类号： G06F9/3842 , G06F9/3851 , G06F9/3861 , G06F9/3887

摘要： One embodiment of the present invention sets forth a technique for speculatively issuing instructions to allow a processing pipeline to continue to process some instructions during rollback of other instructions. A scheduler circuit issues instructions for execution assuming that, several cycles later, when the instructions reach multithreaded execution units, that dependencies between the instructions will be resolved, resources will be available, operand data will be available, and other conditions will not prevent execution of the instructions. When a rollback condition exists at the point of execution for an instruction for a particular thread group, the instruction is not dispatched to the multithreaded execution units. However, other instructions issued by the scheduler circuit for execution by different thread groups, and for which a rollback condition does not exist, are executed by the multithreaded execution units. The instruction incurring the rollback condition is reissued after the rollback condition no longer exists.

7.

发明申请
THREAD GROUP SCHEDULER FOR COMPUTING ON A PARALLEL THREAD PROCESSOR 有权
标题翻译：用于并行螺纹加工器的螺纹组合调度器

公开(公告)号：US20120110586A1

公开(公告)日：2012-05-03

申请号：US13247819

申请日：2011-09-28

申请人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Robert J. Stoll , Nicholas Wang , Jack Hilaire Choquette , Kathleen Elliott Nickolls

发明人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Robert J. Stoll , Nicholas Wang , Jack Hilaire Choquette , Kathleen Elliott Nickolls

IPC分类号： G06F9/46

CPC分类号： G06F9/4881 , G06F2209/483

摘要： A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.

摘要翻译： 并行线程处理器执行属于多个协作线程数组（CTA）的线程组。在并行线程处理器的每个周期，指令调度器在随后的周期中选择要发行的线程组以执行。指令调度器通过（i）识别可用线程组的池，（ii）识别具有最大资历值的CTA来选择要执行的线程组，以及（iii）选择具有最大信用值的线程组从具有最高资历价值的CTA内。

8.

发明申请
SYSTEM AND METHOD FOR PERFORMING SHAPED MEMORY ACCESS OPERATIONS 审中-公开
标题翻译：用于执行形状记忆访问操作的系统和方法

公开(公告)号：US20130145124A1

公开(公告)日：2013-06-06

申请号：US13312954

申请日：2011-12-06

申请人： Xiaogang Qiu , Jack Hilaire Choquette , Manuel Olivier Gautho , Ming Y. (Michael) Siu

发明人： Xiaogang Qiu , Jack Hilaire Choquette , Manuel Olivier Gautho , Ming Y. (Michael) Siu

IPC分类号： G06F9/30

CPC分类号： G06F15/167 , G06F9/3012 , G06F9/3455 , G06F9/383 , G06F9/3851 , G06F9/3887

摘要： One embodiment of the present invention sets forth a technique that provides an efficient way to retrieve operands from a register file. Specifically, the instruction dispatch unit receives one or more instructions, each of which includes one or more operands. Collectively, the operands are organized into one or more operand groups from which a shaped access may be formed. The operands are retrieved from the register file and stored in a collector. Once all operands are read and collected in the collector, the instruction dispatch unit transmits the instructions and corresponding operands to functional units within the streaming multiprocessor for execution. One advantage of the present invention is that multiple operands are retrieved from the register file in a single register access operation without resource conflict. Performance in retrieving operands from the register file is improved by forming shaped accesses that efficiently retrieve operands exhibiting recognized memory access patterns.

摘要翻译： 本发明的一个实施例提出了提供从寄存器文件中检索操作数的有效方式的技术。具体地，指令分派单元接收一个或多个指令，每个指令包括一个或多个操作数。总的来说，操作数被组织成一个或多个操作数组，从中可以形成成形的访问。操作数从寄存器文件中检索并存储在收集器中。一旦所有操作数被读取并收集在收集器中，指令分派单元将指令和相应的操作数发送到流多处理器内的功能单元以供执行。本发明的一个优点是在没有资源冲突的情况下，在单个寄存器访问操作中从寄存器文件中检索多个操作数。通过形成有效地检索具有公认的存储器访问模式的操作数的形状访问来改进从寄存器文件中检索操作数的性能。

9.

发明授权
Multi-level instruction cache prefetching 有权
标题翻译：多级指令缓存预取

公开(公告)号：US09110810B2

公开(公告)日：2015-08-18

申请号：US13312962

申请日：2011-12-06

申请人： Nicholas Wang , Jack Hilaire Choquette

发明人： Nicholas Wang , Jack Hilaire Choquette

IPC分类号： G06F13/00 , G06F13/28 , G06F12/08 , G06F9/38

CPC分类号： G06F12/0862 , G06F9/3802 , G06F9/3875 , G06F2212/6026

摘要： One embodiment of the present invention sets forth an improved way to prefetch instructions in a multi-level cache. Fetch unit initiates a prefetch operation to transfer one of a set of multiple cache lines, based on a function of a pseudorandom number generator and the sector corresponding to the current instruction L1 cache line. The fetch unit selects a prefetch target from the set of multiple cache lines according to some probability function. If the current instruction L1 cache 370 is located within the first sector of the corresponding L1.5 cache line, then the selected prefetch target is located at a sector within the next L1.5 cache line. The result is that the instruction L1 cache hit rate is improved and instruction fetch latency is reduced, even where the processor consumes instructions in the instruction L1 cache at a fast rate.

摘要翻译： 本发明的一个实施例提出了一种改进的方式来预取多级缓存中的指令。提取单元基于伪随机数发生器的功能和与当前指令L1高速缓存行相对应的扇区，发起预取操作以传送一组多个高速缓存行中的一个。提取单元根据一些概率函数从多条高速缓存行集合中选择预取目标。如果当前指令L1高速缓存370位于对应的L1.5高速缓存行的第一扇区内，则所选择的预取目标位于下一个L1.5高速缓存行内的扇区处。结果是，即使在处理器以快速的速率消耗指令L1高速缓存中的指令的情况下，指令L1高速缓存命中率得到改善并且指令提取延迟被降低。

10.

发明授权
Methods and apparatus for source operand collector caching 有权
标题翻译：源操作数采集器缓存的方法和装置

公开(公告)号：US08639882B2

公开(公告)日：2014-01-28

申请号：US13326183

申请日：2011-12-14

申请人： Jack Hilaire Choquette , Manuel Olivier Gautho , John Erik Lindholm

发明人： Jack Hilaire Choquette , Manuel Olivier Gautho , John Erik Lindholm

IPC分类号： G06F12/00

CPC分类号： G06F9/3009 , G06F9/3012 , G06F9/3832

摘要： Methods and apparatus for source operand collector caching. In one embodiment, a processor includes a register file that may be coupled to storage elements (i.e., an operand collector) that provide inputs to the datapath of the processor core for executing an instruction. In order to reduce bandwidth between the register file and the operand collector, operands may be cached and reused in subsequent instructions. A scheduling unit maintains a cache table for monitoring which register values are currently stored in the operand collector. The scheduling unit may also configure the operand collector to select the particular storage elements that are coupled to the inputs to the datapath for a given instruction.

摘要翻译： 源操作数采集器缓存的方法和装置。在一个实施例中，处理器包括可以耦合到存储元件（即，操作数收集器）的寄存器文件，其提供用于执行指令的处理器核的数据路径的输入。为了减少寄存器文件和操作数收集器之间的带宽，操作数可以在随后的指令中缓存并重新使用。调度单元维护高速缓存表，用于监视当前存储在操作数收集器中的寄存器值。调度单元还可以配置操作数收集器以选择耦合到给定指令的数据路径的输入的特定存储元件。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类