Methods and apparatus for scheduling instructions using pre-decode data

    公开(公告)号:US09798548B2

    公开(公告)日:2017-10-24

    申请号:US13333879

    申请日:2011-12-21

    摘要: Systems and methods for scheduling instructions using pre-decode data corresponding to each instruction. In one embodiment, a multi-core processor includes a scheduling unit in each core for selecting instructions from two or more threads each scheduling cycle for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The pre-decode data is determined by a compiler and is extracted by the scheduling unit during runtime and used to control selection of threads for execution. The pre-decode data may specify a number of scheduling cycles to wait before scheduling the instruction. The pre-decode data may also specify a scheduling priority for the instruction. Once the scheduling unit selects an instruction to issue for execution, a decode unit fully decodes the instruction.

    METHODS AND APPARATUS FOR SCHEDULING INSTRUCTIONS USING PRE-DECODE DATA
    2.
    发明申请
    METHODS AND APPARATUS FOR SCHEDULING INSTRUCTIONS USING PRE-DECODE DATA 有权
    使用预编码数据调度指令的方法和装置

    公开(公告)号:US20130166881A1

    公开(公告)日:2013-06-27

    申请号:US13333879

    申请日:2011-12-21

    IPC分类号: G06F9/30 G06F9/312

    摘要: Systems and methods for scheduling instructions using pre-decode data corresponding to each instruction. In one embodiment, a multi-core processor includes a scheduling unit in each core for selecting instructions from two or more threads each scheduling cycle for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The pre-decode data is determined by a compiler and is extracted by the scheduling unit during runtime and used to control selection of threads for execution. The pre-decode data may specify a number of scheduling cycles to wait before scheduling the instruction. The pre-decode data may also specify a scheduling priority for the instruction. Once the scheduling unit selects an instruction to issue for execution, a decode unit fully decodes the instruction.

    摘要翻译: 用于使用对应于每个指令的预解码数据调度指令的系统和方法。 在一个实施例中,多核处理器包括每个核心中的调度单元,用于从两个或更多个线程中选择用于在该特定核心上执行的调度周期的指令。 由于线程被安排在核心上执行,所以来自线程的指令被取入到缓冲器中而不被解码。 预解码数据由编译器确定,并且在运行时由调度单元提取并用于控制用于执行的线程的选择。 预解码数据可以指定在调度指令之前等待的多个调度周期。 预解码数据还可以指定该指令的调度优先级。 一旦调度单元选择要执行的指令,则解码单元完全解码该指令。

    Speculative execution and rollback

    公开(公告)号:US09830158B2

    公开(公告)日:2017-11-28

    申请号:US13289643

    申请日:2011-11-04

    IPC分类号: G06F9/38

    摘要: One embodiment of the present invention sets forth a technique for speculatively issuing instructions to allow a processing pipeline to continue to process some instructions during rollback of other instructions. A scheduler circuit issues instructions for execution assuming that, several cycles later, when the instructions reach multithreaded execution units, that dependencies between the instructions will be resolved, resources will be available, operand data will be available, and other conditions will not prevent execution of the instructions. When a rollback condition exists at the point of execution for an instruction for a particular thread group, the instruction is not dispatched to the multithreaded execution units. However, other instructions issued by the scheduler circuit for execution by different thread groups, and for which a rollback condition does not exist, are executed by the multithreaded execution units. The instruction incurring the rollback condition is reissued after the rollback condition no longer exists.

    SPECULATIVE EXECUTION AND ROLLBACK
    4.
    发明申请
    SPECULATIVE EXECUTION AND ROLLBACK 有权
    统一执行和滚动

    公开(公告)号:US20130117541A1

    公开(公告)日:2013-05-09

    申请号:US13289643

    申请日:2011-11-04

    IPC分类号: G06F9/30

    摘要: One embodiment of the present invention sets forth a technique for speculatively issuing instructions to allow a processing pipeline to continue to process some instructions during rollback of other instructions. A scheduler circuit issues instructions for execution assuming that, several cycles later, when the instructions reach multithreaded execution units, that dependencies between the instructions will be resolved, resources will be available, operand data will be available, and other conditions will not prevent execution of the instructions. When a rollback condition exists at the point of execution for an instruction for a particular thread group, the instruction is not dispatched to the multithreaded execution units. However, other instructions issued by the scheduler circuit for execution by different thread groups, and for which a rollback condition does not exist, are executed by the multithreaded execution units. The instruction incurring the rollback condition is reissued after the rollback condition no longer exists.

    摘要翻译: 本发明的一个实施例提出了一种用于推测发出指令以允许处理流水线在其他指令的回滚期间继续处理一些指令的技术。 调度器电路发出执行指令,假设几个周期后,当指令到达多线程执行单元时,指令之间的相关性将被解决,资源将可用,操作数数据将可用,而其他条件将不会阻止执行 说明。 当在特定线程组的指令的执行点处存在回滚条件时,指令不会分派给多线程执行单元。 然而,由多线程执行单元执行由调度器电路发出的用于由不同线程组执行并且不存在回滚条件的其他指令。 在回滚条件不再存在之后,重新发出导致回滚条件的指令。

    Thread group scheduler for computing on a parallel thread processor
    6.
    发明授权
    Thread group scheduler for computing on a parallel thread processor 有权
    线程组调度程序,用于在并行线程处理器上进行计算

    公开(公告)号:US08732713B2

    公开(公告)日:2014-05-20

    申请号:US13247819

    申请日:2011-09-28

    IPC分类号: G06F9/46

    CPC分类号: G06F9/4881 G06F2209/483

    摘要: A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.

    摘要翻译: 并行线程处理器执行属于多个协作线程数组(CTA)的线程组。 在并行线程处理器的每个周期,指令调度器在随后的周期中选择要发行的线程组以执行。 指令调度器通过(i)识别可用线程组的池,(ii)识别具有最大资历值的CTA来选择要执行的线程组,以及(iii)选择具有最大信用值的线程组 从具有最高资历价值的CTA内。

    System and method for storing states used to configure a processing pipeline in a graphics processing unit
    9.
    发明授权
    System and method for storing states used to configure a processing pipeline in a graphics processing unit 有权
    用于存储用于在图形处理单元中配置处理流水线的状态的系统和方法

    公开(公告)号:US07725688B1

    公开(公告)日:2010-05-25

    申请号:US11470013

    申请日:2006-09-05

    IPC分类号: G06F9/00 G06T1/20

    摘要: States that are used in configuring a processing pipeline are passed down through a separate pipeline in parallel with the data transmitted down through the processing pipeline. With this separate pipeline, the states for configuring any one stage of the processing pipeline are continuously available in the corresponding stage of the state pipeline, and new states for configuring the processing pipeline can be transmitted down the state pipeline without flushing the processing pipeline. The processing pipeline and the separate pipeline for the states can be divided into multiple sections so that the width of the separate pipeline for the states can be reduced.

    摘要翻译: 用于配置处理流水线的状态与通过处理流水线向下传输的数据并行传送通过单独的流水线。 通过这个单独的流水线,用于配置处理流水线的任何一个阶段的状态在状态流水线的相应阶段中连续可用,并且用于配置处理流水线的新状态可以在状态管道下传送而不冲洗处理流水线。 处理管线和状态的单独管道可以分为多个部分,以便可以减少用于状态的单独管道的宽度。

    Color-compression using automatic reduction of multi-sampled pixels
    10.
    发明授权
    Color-compression using automatic reduction of multi-sampled pixels 有权
    使用自动缩小多采样像素进行色彩压缩

    公开(公告)号:US08233004B1

    公开(公告)日:2012-07-31

    申请号:US11557068

    申请日:2006-11-06

    IPC分类号: G09G5/00

    摘要: One embodiment of the present invention sets forth a technique for improving graphics rendering efficiency by processing pixels in a compressed format whenever possible within a multi-sampling graphics pipeline. Each geometric primitive is rasterized into fragments, corresponding to screen space pixels covered at least partially by the geometric primitive. Fragment coverage represents the pixel area covered by the geometric primitive and determines the weighted contribution of a fragment color to the corresponding screen space pixel. Samples associated with a given fragment are called sibling samples and have the same color value. The property of sibling samples having the same color value is exploited to compress and process multiple samples, thereby reducing the size of the associated logic and the amount of data written to and read from the frame buffer.

    摘要翻译: 本发明的一个实施例提出了一种通过在多采样图形流水线内尽可能处理压缩格式的像素来提高图形渲染效率的技术。 每个几何图元被光栅化成片段,对应于至少部分地被几何原始图案覆盖的屏幕空间像素。 片段覆盖表示由几何图元覆盖的像素区域,并确定片段颜色对相应屏幕空间像素的加权贡献。 与给定片段相关联的样本称为兄弟样本,并具有相同的颜色值。 利用具有相同颜色值的兄弟样本的属性来压缩和处理多个样本,从而减小相关逻辑的大小以及写入和从帧缓冲器读取的数据量。