专利检索 ap:"Jack Hilaire CHOQUETTE" 第 1 页

1.

发明授权
System and method for performing shaped memory access operations 有权

公开(公告)号：US10255228B2

公开(公告)日：2019-04-09

申请号：US13312954

申请日：2011-12-06

申请人： Xiaogang Qiu , Jack Hilaire Choquette , Manuel Olivier Gautho , Ming Y. (Michael) Siu

发明人： Xiaogang Qiu , Jack Hilaire Choquette , Manuel Olivier Gautho , Ming Y. (Michael) Siu

IPC分类号： G06F9/30 , G06F15/167 , G06F9/38 , G06F9/345

摘要： One embodiment of the present invention sets forth a technique that provides an efficient way to retrieve operands from a register file. Specifically, the instruction dispatch unit receives one or more instructions, each of which includes one or more operands. Collectively, the operands are organized into one or more operand groups from which a shaped access may be formed. The operands are retrieved from the register file and stored in a collector. Once all operands are read and collected in the collector, the instruction dispatch unit transmits the instructions and corresponding operands to functional units within the streaming multiprocessor for execution. One advantage of the present invention is that multiple operands are retrieved from the register file in a single register access operation without resource conflict. Performance in retrieving operands from the register file is improved by forming shaped accesses that efficiently retrieve operands exhibiting recognized memory access patterns.

2.

发明授权
Speculative execution and rollback 有权

公开(公告)号：US09830158B2

公开(公告)日：2017-11-28

申请号：US13289643

申请日：2011-11-04

申请人： Jack Hilaire Choquette , Olivier Giroux , Robert J. Stoll , Xiaogang Qiu

发明人： Jack Hilaire Choquette , Olivier Giroux , Robert J. Stoll , Xiaogang Qiu

IPC分类号： G06F9/38

CPC分类号： G06F9/3842 , G06F9/3851 , G06F9/3861 , G06F9/3887

摘要： One embodiment of the present invention sets forth a technique for speculatively issuing instructions to allow a processing pipeline to continue to process some instructions during rollback of other instructions. A scheduler circuit issues instructions for execution assuming that, several cycles later, when the instructions reach multithreaded execution units, that dependencies between the instructions will be resolved, resources will be available, operand data will be available, and other conditions will not prevent execution of the instructions. When a rollback condition exists at the point of execution for an instruction for a particular thread group, the instruction is not dispatched to the multithreaded execution units. However, other instructions issued by the scheduler circuit for execution by different thread groups, and for which a rollback condition does not exist, are executed by the multithreaded execution units. The instruction incurring the rollback condition is reissued after the rollback condition no longer exists.

3.

发明授权
Shaped register file reads 有权

公开(公告)号：US09626191B2

公开(公告)日：2017-04-18

申请号：US13335868

申请日：2011-12-22

申请人： Jack Hilaire Choquette , Michael Fetterman , Shirish Gadre , Xiaogang Qiu , Omkar Paranjape , Anjana Rajendran , Stewart Glenn Carlton , Eric Lyell Hill , Rajeshwaran Selvanesan , Douglas J. Hahn

发明人： Jack Hilaire Choquette , Michael Fetterman , Shirish Gadre , Xiaogang Qiu , Omkar Paranjape , Anjana Rajendran , Stewart Glenn Carlton , Eric Lyell Hill , Rajeshwaran Selvanesan , Douglas J. Hahn

IPC分类号： G06F9/38 , G06F9/30

CPC分类号： G06F9/3851 , G06F9/3012

摘要： One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.

4.

发明申请
BATCHED REPLAYS OF DIVERGENT OPERATIONS 有权
标题翻译：批量操作的重复操作

公开(公告)号：US20130159684A1

公开(公告)日：2013-06-20

申请号：US13329066

申请日：2011-12-16

申请人： Michael Fetterman , Jack Hilaire Choquette , Omkar Paranjape , Anjana Rajendran , Eric Lyell Hill , Stewart glenn Carlton , Rajeshwaran Selvanesan , Douglas J. Hahn , Steven James Heinrich

发明人： Michael Fetterman , Jack Hilaire Choquette , Omkar Paranjape , Anjana Rajendran , Eric Lyell Hill , Stewart glenn Carlton , Rajeshwaran Selvanesan , Douglas J. Hahn , Steven James Heinrich

IPC分类号： G06F9/38 , G06F9/312

CPC分类号： G06F9/3851 , G06F9/3861

摘要： One embodiment of the present invention sets forth an optimized way to execute replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back. Advantageously, divergent operations requiring two or more replay operations operate with reduced latency. Where memory access operations require transfer of more than two cache lines to service all threads, the number of clock cycles required to complete all replay operations is reduced.

摘要翻译： 本发明的一个实施例阐述了在并行处理子系统中对发散操作执行重放操作的优化方法。具体地说，流式多处理器（SM）包括多级流水线，其被配置为批量两个或更多个重播操作以便经由重放循环进行处理。多级流水线内的逻辑元件检测当前流水线阶段是否正在访问共享资源，例如从共享内存加载数据。如果线程正在访问分布在多个高速缓存行中的数据，则多级管道批量执行两个或更多个重放操作，其中重放操作被背对背地插入到管道中。有利地，需要两次或更多次重放操作的发散操作以降低的等待时间运行。在存储器访问操作需要传送两条以上的高速缓存行以服务所有线程的情况下，完成所有重放操作所需的时钟周期数减少。

5.

发明申请
SPECULATIVE EXECUTION AND ROLLBACK 有权
标题翻译：统一执行和滚动

公开(公告)号：US20130117541A1

公开(公告)日：2013-05-09

申请号：US13289643

申请日：2011-11-04

申请人： Jack Hilaire CHOQUETTE , Olivier Giroux , Robert J. Stoll , Xiaogang Qiu

发明人： Jack Hilaire CHOQUETTE , Olivier Giroux , Robert J. Stoll , Xiaogang Qiu

IPC分类号： G06F9/30

CPC分类号： G06F9/3842 , G06F9/3851 , G06F9/3861 , G06F9/3887

摘要： One embodiment of the present invention sets forth a technique for speculatively issuing instructions to allow a processing pipeline to continue to process some instructions during rollback of other instructions. A scheduler circuit issues instructions for execution assuming that, several cycles later, when the instructions reach multithreaded execution units, that dependencies between the instructions will be resolved, resources will be available, operand data will be available, and other conditions will not prevent execution of the instructions. When a rollback condition exists at the point of execution for an instruction for a particular thread group, the instruction is not dispatched to the multithreaded execution units. However, other instructions issued by the scheduler circuit for execution by different thread groups, and for which a rollback condition does not exist, are executed by the multithreaded execution units. The instruction incurring the rollback condition is reissued after the rollback condition no longer exists.

摘要翻译： 本发明的一个实施例提出了一种用于推测发出指令以允许处理流水线在其他指令的回滚期间继续处理一些指令的技术。调度器电路发出执行指令，假设几个周期后，当指令到达多线程执行单元时，指令之间的相关性将被解决，资源将可用，操作数数据将可用，而其他条件将不会阻止执行说明。当在特定线程组的指令的执行点处存在回滚条件时，指令不会分派给多线程执行单元。然而，由多线程执行单元执行由调度器电路发出的用于由不同线程组执行并且不存在回滚条件的其他指令。在回滚条件不再存在之后，重新发出导致回滚条件的指令。

6.

发明授权
Methods and apparatus for scheduling instructions using pre-decode data 有权

公开(公告)号：US09798548B2

公开(公告)日：2017-10-24

申请号：US13333879

申请日：2011-12-21

申请人： Jack Hilaire Choquette , Robert J. Stoll , Olivier Giroux

发明人： Jack Hilaire Choquette , Robert J. Stoll , Olivier Giroux

IPC分类号： G06F15/00 , G06F9/30 , G06F9/40 , G06F9/38

CPC分类号： G06F9/3851 , G06F9/3802 , G06F9/382

摘要： Systems and methods for scheduling instructions using pre-decode data corresponding to each instruction. In one embodiment, a multi-core processor includes a scheduling unit in each core for selecting instructions from two or more threads each scheduling cycle for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The pre-decode data is determined by a compiler and is extracted by the scheduling unit during runtime and used to control selection of threads for execution. The pre-decode data may specify a number of scheduling cycles to wait before scheduling the instruction. The pre-decode data may also specify a scheduling priority for the instruction. Once the scheduling unit selects an instruction to issue for execution, a decode unit fully decodes the instruction.

7.

发明申请
METHODS AND APPARATUS FOR SCHEDULING INSTRUCTIONS WITHOUT INSTRUCTION DECODE 审中-公开
标题翻译：用于在没有指令解码的情况下安排指令的方法和装置

公开(公告)号：US20130166882A1

公开(公告)日：2013-06-27

申请号：US13335872

申请日：2011-12-22

申请人： Jack Hilaire CHOQUETTE , Robert J. STOLL , Olivier GIROUX , Michael FETTERMAN , Shirish GADRE , Robert Steven GLANVILLE , Alexandre JOLY

发明人： Jack Hilaire CHOQUETTE , Robert J. STOLL , Olivier GIROUX , Michael FETTERMAN , Shirish GADRE , Robert Steven GLANVILLE , Alexandre JOLY

IPC分类号： G06F9/30 , G06F9/38 , G06F9/312

CPC分类号： G06F9/3851 , G06F9/382

摘要： Systems and methods for scheduling instructions without instruction decode. In one embodiment, a multi-core processor includes a scheduling unit in each core for scheduling instructions from two or more threads scheduled for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The scheduling unit includes a macro-scheduler unit for performing a priority sort of the two or more threads and a micro-scheduler arbiter for determining the highest order thread that is ready to execute. The macro-scheduler unit and the micro-scheduler arbiter use pre-decode data to implement the scheduling algorithm. The pre-decode data may be generated by decoding only a small portion of the instruction or received along with the instruction. Once the micro-scheduler arbiter has selected an instruction to dispatch to the execution unit, a decode unit fully decodes the instruction.

摘要翻译： 用于调度指令而不进行指令解码的系统和方法。在一个实施例中，多核处理器包括每个核心中的调度单元，用于调度来自在该特定核心上执行的两个或更多个线程的指令。由于线程被安排在核心上执行，所以来自线程的指令被取入到缓冲器中而不被解码。调度单元包括用于执行两个或更多个线程的优先级排序的宏调度器单元和用于确定准备执行的最高阶线程的微调度器仲裁器。宏调度器单元和微调度器仲裁器使用预解码数据来实现调度算法。预解码数据可以仅通过解码指令的一小部分或与该指令一起被接收来产生。一旦微调度器仲裁器选择了向执行单元发送的指令，则解码单元对该指令进行完全解码。

8.

发明申请
METHODS AND APPARATUS FOR SOURCE OPERAND COLLECTOR CACHING 有权
标题翻译：来源操作收集器缓存的方法和装置

公开(公告)号：US20130159628A1

公开(公告)日：2013-06-20

申请号：US13326183

申请日：2011-12-14

申请人： Jack Hilaire CHOQUETTE , Manuel Olivier Gautho , John Erik Lindholm

发明人： Jack Hilaire CHOQUETTE , Manuel Olivier Gautho , John Erik Lindholm

IPC分类号： G06F12/08

CPC分类号： G06F9/3009 , G06F9/3012 , G06F9/3832

摘要： Methods and apparatus for source operand collector caching. In one embodiment, a processor includes a register file that may be coupled to storage elements (i.e., an operand collector) that provide inputs to the datapath of the processor core for executing an instruction. In order to reduce bandwidth between the register file and the operand collector, operands may be cached and reused in subsequent instructions. A scheduling unit maintains a cache table for monitoring which register values are currently stored in the operand collector. The scheduling unit may also configure the operand collector to select the particular storage elements that are coupled to the inputs to the datapath for a given instruction.

摘要翻译： 源操作数采集器缓存的方法和装置。在一个实施例中，处理器包括可以耦合到存储元件（即，操作数收集器）的寄存器文件，其提供用于执行指令的处理器核的数据路径的输入。为了减少寄存器文件和操作数收集器之间的带宽，操作数可以在随后的指令中缓存并重新使用。调度单元维护高速缓存表，用于监视当前存储在操作数收集器中的寄存器值。调度单元还可以配置操作数收集器以选择耦合到给定指令的数据路径的输入的特定存储元件。

9.

发明申请
THREAD GROUP SCHEDULER FOR COMPUTING ON A PARALLEL THREAD PROCESSOR 有权
标题翻译：用于并行螺纹加工器的螺纹组合调度器

公开(公告)号：US20120110586A1

公开(公告)日：2012-05-03

申请号：US13247819

申请日：2011-09-28

申请人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Robert J. Stoll , Nicholas Wang , Jack Hilaire Choquette , Kathleen Elliott Nickolls

发明人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Robert J. Stoll , Nicholas Wang , Jack Hilaire Choquette , Kathleen Elliott Nickolls

IPC分类号： G06F9/46

CPC分类号： G06F9/4881 , G06F2209/483

摘要： A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.

摘要翻译： 并行线程处理器执行属于多个协作线程数组（CTA）的线程组。在并行线程处理器的每个周期，指令调度器在随后的周期中选择要发行的线程组以执行。指令调度器通过（i）识别可用线程组的池，（ii）识别具有最大资历值的CTA来选择要执行的线程组，以及（iii）选择具有最大信用值的线程组从具有最高资历价值的CTA内。

10.

发明申请
PRE-SCHEDULED REPLAYS OF DIVERGENT OPERATIONS 审中-公开
标题翻译：预先安排的重复操作

公开(公告)号：US20130212364A1

公开(公告)日：2013-08-15

申请号：US13370173

申请日：2012-02-09

申请人： Michael FETTERMAN , Stewart Glenn Carlton , Jack Hilaire Choquette , Shirish Gadre , Olivier Giroux , Douglas J. Hahn , Steven James Heinrich , Eric Lyell Hill , Charles McCarver , Omkar Paranjape , Anjana Rajendran , Rajeshwaran Selvanesan

发明人： Michael FETTERMAN , Stewart Glenn Carlton , Jack Hilaire Choquette , Shirish Gadre , Olivier Giroux , Douglas J. Hahn , Steven James Heinrich , Eric Lyell Hill , Charles McCarver , Omkar Paranjape , Anjana Rajendran , Rajeshwaran Selvanesan

IPC分类号： G06F9/38 , G06F9/312

CPC分类号： G06F9/3861 , G06F9/3836 , G06F9/3851 , G06F9/3887

摘要： One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced. One advantage of the disclosed technique is that divergent operations requiring one or more replay operations execute with reduced latency.

摘要翻译： 本公开的一个实施例阐述了在并行处理子系统中执行用于发散操作的预先安排的重播操作的优化方式。具体地，流式多处理器（SM）包括多级流水线，其被配置为将预先安排的重播操作插入到多级流水线中。预先安排的重播单元检测与当前指令相关联的操作是否正在访问公共资源。如果线程正在访问分布在多个高速缓存线上的数据，则预先安排的重播单元在当前指令后面插入预先安排的重放操作。多级流水线顺序执行指令和相关的预先安排的重播操作。如果附加线程在执行指令和预先安排的重放操作之后保持未被接受，则通过重放循环插入附加的重放操作，直到对所有线程进行服务。所公开技术的一个优点是需要一个或多个重放操作的发散操作以较低的等待时间执行。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类