Subdividing a shader program
    1.
    发明授权
    Subdividing a shader program 有权
    细分着色程序

    公开(公告)号:US08159496B1

    公开(公告)日:2012-04-17

    申请号:US12476137

    申请日:2009-06-01

    摘要: Methods and apparatus for subdividing a shader program into regions or “phases” of instructions identifiable by phase identifiers (IDs) inserted into the shader program are provided. The phase IDs may be used to constrain execution of the shader program to prohibit texture fetches in later phases from being executed before a texture fetch in a current phase has completed. Other operations (e.g., math operations) within the current phase, however, may be allowed to execute while waiting for the current phase texture fetch to complete.

    摘要翻译: 提供了将着色器程序细分为通过插入到着色器程序中的相位标识符(ID)可识别的指令的区域或“阶段”的方法和装置。 相位ID可以用于限制着色器程序的执行,以便在当前阶段的纹理提取完成之前禁止稍后阶段中的纹理提取被执行。 然而,当前阶段的其他操作(例如,数学运算)可以在等待当前相位纹理提取完成的同时执行。

    Scheduler in multi-threaded processor prioritizing instructions passing qualification rule
    2.
    发明授权
    Scheduler in multi-threaded processor prioritizing instructions passing qualification rule 有权
    多线程处理器调度器优先级指令通过资格规则

    公开(公告)号:US07949855B1

    公开(公告)日:2011-05-24

    申请号:US12110942

    申请日:2008-04-28

    IPC分类号: G06F9/38

    摘要: A processor buffers asynchronous threads. Instructions requiring operations provided by a plurality of execution units are divided into phases, each phase having at least one computation operation and at least one memory access operation. Instructions within each phase are qualified and prioritized. The instructions may be qualified based on the status of the execution unit needed to execute one or more of the current instructions. The instructions may also be qualified based on an age of each instruction, status of the execution units, a divergence potential, locality, thread diversity, and resource requirements. Qualified instructions may be prioritized based on execution units needed to execute instructions and the execution units in use. One or more of the prioritized instructions is issued per cycle to the plurality of execution units.

    摘要翻译: 处理器缓冲异步线程。 由多个执行单元提供的需要操作的指令被划分为相位,每个阶段具有至少一个计算操作和至少一个存储器访问操作。 每个阶段的说明是合格的,并且是优先考虑的。 可以基于执行一个或多个当前指令所需的执行单元的状态来限制指令。 指令也可以基于每个指令的年龄,执行单元的状态,发散电位,局部性,线程分集和资源需求来限定。 可以根据执行指令所需的执行单元和正在使用的执行单元来优先确定合格的指令。 每个周期向多个执行单元发出一个或多个优先指令。

    Subdividing a shader program
    3.
    发明授权
    Subdividing a shader program 有权
    细分着色程序

    公开(公告)号:US07542043B1

    公开(公告)日:2009-06-02

    申请号:US11136346

    申请日:2005-05-23

    IPC分类号: G06T1/20 G06F5/80

    摘要: Methods and apparatus for subdividing a shader program into regions or “phases” of instructions identifiable by phase identifiers (IDs) inserted into the shader program are provided. The phase IDs may be used to constrain execution of the shader program to prohibit texture fetches in later phases from being executed before a texture fetch in a current phase has completed. Other operations (e.g., math operations) within the current phase, however, may be allowed to execute while waiting for the current phase texture fetch to complete.

    摘要翻译: 提供了将着色器程序细分为通过插入到着色器程序中的相位标识符(ID)可识别的指令的区域或“阶段”的方法和装置。 相位ID可以用于限制着色器程序的执行,以便在当前阶段的纹理提取完成之前禁止稍后阶段中的纹理提取被执行。 然而,当前阶段的其他操作(例如,数学运算)可以在等待当前相位纹理提取完成的同时执行。

    Scheduling instructions from multi-thread instruction buffer based on phase boundary qualifying rule for phases of math and data access operations with better caching
    4.
    发明授权
    Scheduling instructions from multi-thread instruction buffer based on phase boundary qualifying rule for phases of math and data access operations with better caching 有权
    基于具有更好缓存的数学和数据访问操作阶段的相位边界限定规则的多线程指令缓冲区的调度指令

    公开(公告)号:US07366878B1

    公开(公告)日:2008-04-29

    申请号:US11404196

    申请日:2006-04-13

    IPC分类号: G06F9/50

    摘要: A processor buffers asynchronous threads. Current instructions requiring operations provided by a plurality of execution units are divided into phases, each phase having at least one math operation and at least one texture cache access operation. Instructions within each phase are qualified and prioritized, with texture cache access operations in a subsequent phase not qualified until all of the texture cache access operations in a current phase have completed. The instructions may be qualified based on the status of the execution unit needed to execute one or more of the instructions. The instructions may also be qualified based on an age of each instruction, a divergence potential, locality, thread diversity, and resource requirements. Qualified instructions may be prioritized based on execution units needed to execute current instructions and the execution units in use. One or more of the prioritized instructions is issued per cycle to the plurality of execution units.

    摘要翻译: 处理器缓冲异步线程。 由多个执行单元提供的需要操作的当前指令被划分为相位,每个相位具有至少一个数学运算和至少一个纹理高速缓存存取操作。 每个阶段内的指令都是合格的并且是优先级排序的,后续阶段的纹理高速缓存访​​问操作在当前阶段的所有纹理缓存访问操作都已经完成之前不合格。 可以基于执行一个或多个指令所需的执行单元的状态来限制指令。 指令也可以根据每个指令的年龄,分歧潜力,局部性,线程分集和资源需求进行限定。 可以根据执行当前指令所需的执行单元和正在使用的执行单元,优先考虑合格的指令。 每个周期向多个执行单元发出一个或多个优先指令。

    Thread group scheduler for computing on a parallel thread processor
    5.
    发明授权
    Thread group scheduler for computing on a parallel thread processor 有权
    线程组调度程序,用于在并行线程处理器上进行计算

    公开(公告)号:US08732713B2

    公开(公告)日:2014-05-20

    申请号:US13247819

    申请日:2011-09-28

    IPC分类号: G06F9/46

    CPC分类号: G06F9/4881 G06F2209/483

    摘要: A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.

    摘要翻译: 并行线程处理器执行属于多个协作线程数组(CTA)的线程组。 在并行线程处理器的每个周期,指令调度器在随后的周期中选择要发行的线程组以执行。 指令调度器通过(i)识别可用线程组的池,(ii)识别具有最大资历值的CTA来选择要执行的线程组,以及(iii)选择具有最大信用值的线程组 从具有最高资历价值的CTA内。

    Processing an indirect branch instruction in a SIMD architecture
    6.
    发明授权
    Processing an indirect branch instruction in a SIMD architecture 有权
    在SIMD架构中处理间接分支指令

    公开(公告)号:US07761697B1

    公开(公告)日:2010-07-20

    申请号:US11557082

    申请日:2006-11-06

    IPC分类号: G06F7/38 G06F9/00 G06F9/44

    摘要: One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determining that the program instruction is an indirect branch instruction, and processing the indirect branch instruction as a sequence of two-way branches to execute an indirect branch instruction with multiple branch addresses. Indirect branch instructions may be used to allow greater flexibility since the branch address or multiple branch addresses do not need to be determined at compile time.

    摘要翻译: 被配置为管理线程组中的发散线程的计算系统的一个实施例包括配置成存储至少一个令牌和多线程处理单元的堆栈。 多线程处理单元被配置为执行以下步骤:获取程序指令,确定程序指令是间接分支指令,以及将间接分支指令处理为双向分支序列,以执行具有多个分支的间接分支指令 地址 可以使用间接分支指令来允许更大的灵活性,因为在编译时不需要确定分支地址或多个分支地址。

    Structured programming control flow using a disable mask in a SIMD architecture
    7.
    发明授权
    Structured programming control flow using a disable mask in a SIMD architecture 有权
    在SIMD架构中使用禁用掩码的结构化编程控制流程

    公开(公告)号:US07617384B1

    公开(公告)日:2009-11-10

    申请号:US11669513

    申请日:2007-01-31

    IPC分类号: G06F15/80

    摘要: One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. Threads that exit a program are identified as idle by a disable mask. Other threads that are disabled may be enabled once the divergent threads reach an instruction that enables the disabled threads. Use of the disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture.

    摘要翻译: 被配置为管理SIMD线程组中的发散线程的计算系统的一个实施例包括被配置为存储用于处理控制指令的状态信息的堆栈。 并行处理单元被配置为执行在执行条件控制指令期间确定一个或多个线程是否发散的步骤。 退出程序的线程被禁用掩码标识为空闲。 禁用的其他线程可以在分支线程达到启用禁用线程的指令后启用。 禁用掩码的使用允许在多线程SIMD架构中使用条件返回和中断指令。

    Credit-based streaming multiprocessor warp scheduling
    9.
    发明授权
    Credit-based streaming multiprocessor warp scheduling 有权
    基于信用流的多处理器扭曲调度

    公开(公告)号:US09189242B2

    公开(公告)日:2015-11-17

    申请号:US12885299

    申请日:2010-09-17

    IPC分类号: G06F9/50 G06F9/38

    摘要: One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.

    摘要翻译: 本发明的一个实施例提出了一种用于确保高速缓存访​​问指令被调度用于在多线程系统中执行以提高高速缓存位置和系统性能的技术。 可以使用基于信用的技术来对组中的每个翘曲的指令调度来控制指令,使得一组经线被均匀地处理。 对每个经纱计算信用额度,并且信用额度有助于每个经线的权重。 权重用于选择要执行的经纱的说明。

    Programmable graphics processor for multithreaded execution of programs
    10.
    发明授权
    Programmable graphics processor for multithreaded execution of programs 有权
    用于多线程执行程序的可编程图形处理器

    公开(公告)号:US08405665B2

    公开(公告)日:2013-03-26

    申请号:US13466043

    申请日:2012-05-07

    CPC分类号: G06T15/005

    摘要: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

    摘要翻译: 处理单元包括多个执行流水线,每个执行流水线连接到第一输入部分,用于接收用于像素处理的输入数据和用于接收用于顶点处理的输入数据的第二输入部分和用于存储经处理的像素数据的第一输出部分和 用于存储经处理的顶点数据的第二输出部分。 经处理的顶点数据被光栅化并扫描转换为用作像素处理的输入数据的像素数据。 经处理的像素数据被输出到光栅分析器。