Thread group scheduler for computing on a parallel thread processor
    2.
    发明授权
    Thread group scheduler for computing on a parallel thread processor 有权
    线程组调度程序,用于在并行线程处理器上进行计算

    公开(公告)号:US08732713B2

    公开(公告)日:2014-05-20

    申请号:US13247819

    申请日:2011-09-28

    IPC分类号: G06F9/46

    CPC分类号: G06F9/4881 G06F2209/483

    摘要: A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.

    摘要翻译: 并行线程处理器执行属于多个协作线程数组(CTA)的线程组。 在并行线程处理器的每个周期,指令调度器在随后的周期中选择要发行的线程组以执行。 指令调度器通过(i)识别可用线程组的池,(ii)识别具有最大资历值的CTA来选择要执行的线程组,以及(iii)选择具有最大信用值的线程组 从具有最高资历价值的CTA内。

    EFFICIENT IMPLEMENTATION OF ARRAYS OF STRUCTURES ON SIMT AND SIMD ARCHITECTURES
    3.
    发明申请
    EFFICIENT IMPLEMENTATION OF ARRAYS OF STRUCTURES ON SIMT AND SIMD ARCHITECTURES 有权
    对SIMT和SIMD建筑结构的有效实施

    公开(公告)号:US20120089792A1

    公开(公告)日:2012-04-12

    申请号:US13247855

    申请日:2011-09-28

    IPC分类号: G06F12/00

    摘要: One embodiment of the present invention sets forth a technique providing an optimized way to allocate and access memory across a plurality of thread/data lanes. Specifically, the device driver receives an instruction targeted to a memory set up as an array of structures of arrays. The device driver computes an address within the memory using information about the number of thread/data lanes and parameters from the instruction itself. The result is a memory allocation and access approach where the device driver properly computes the target address in the memory. Advantageously, processing efficiency is improved where memory in a parallel processing subsystem is internally stored and accessed as an array of structures of arrays, proportional to the SIMT/SIMD group width (the number of threads or lanes per execution group).

    摘要翻译: 本发明的一个实施例提出了一种技术,其提供了一种在多个线程/数据通道上分配和访问存储器的优化方式。 具体来说,设备驱动程序接收到作为阵列结构的阵列设置的存储器的指令。 设备驱动程序使用关于指令本身的线程/数据通道数和参数的信息来计算存储器中的地址。 结果是存储器分配和访问方法,其中设备驱动器正确地计算存储器中的目标地址。 有利的是,处理效率得到改善,其中并行处理子系统中的存储器被内部存储和访问为与SIMT / SIMD组宽度(每个执行组的线程或通道数)成比例的阵列结构的阵列。

    Register based queuing for texture requests
    4.
    发明授权
    Register based queuing for texture requests 有权
    基于注册排队的纹理请求

    公开(公告)号:US07456835B2

    公开(公告)日:2008-11-25

    申请号:US11339937

    申请日:2006-01-25

    CPC分类号: G06T11/60 G09G5/363

    摘要: A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.

    摘要翻译: 图形处理单元可以排队大量纹理请求,以平衡纹理请求的可变性,而不需要大的纹理请求缓冲区。 专用纹理请求缓冲区排队相对较小的纹理命令和参数。 另外,对于每个排队的纹理命令,通常比纹理命令大得多的一组相关的纹理参数存储在通用寄存器中。 纹理单元从纹理请求缓冲区中检索纹理命令,然后从相应的通用寄存器获取相关的纹理参数。 纹理参数可以存储在指定为由纹理单元计算的最终纹理值的目的地的通用寄存器中。 因为当纹理命令排队时,必须为目标寄存器分配最终纹理值,所以将纹理参数存储在该寄存器中不消耗任何其他寄存器。

    Structured programming control flow in a SIMD architecture
    5.
    发明授权
    Structured programming control flow in a SIMD architecture 有权
    SIMD架构中的结构化编程控制流程

    公开(公告)号:US07877585B1

    公开(公告)日:2011-01-25

    申请号:US11845429

    申请日:2007-08-27

    摘要: One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. A disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture. Additional control instructions are used to set up thread processing target addresses for synchronization, breaks, and returns.

    摘要翻译: 被配置为管理SIMD线程组中的发散线程的计算系统的一个实施例包括被配置为存储用于处理控制指令的状态信息的堆栈。 并行处理单元被配置为执行在执行条件控制指令期间确定一个或多个线程是否发散的步骤。 禁用掩码允许在多线程SIMD架构中使用条件返回和中断指令。 附加控制指令用于设置线程处理目标地址以进行同步,中断和返回。

    Register based queuing for texture requests

    公开(公告)号:US07027062B2

    公开(公告)日:2006-04-11

    申请号:US10789735

    申请日:2004-02-27

    IPC分类号: G06T11/40

    CPC分类号: G06T11/60 G09G5/363

    摘要: A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.

    Processing an indirect branch instruction in a SIMD architecture
    8.
    发明授权
    Processing an indirect branch instruction in a SIMD architecture 有权
    在SIMD架构中处理间接分支指令

    公开(公告)号:US07761697B1

    公开(公告)日:2010-07-20

    申请号:US11557082

    申请日:2006-11-06

    IPC分类号: G06F7/38 G06F9/00 G06F9/44

    摘要: One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determining that the program instruction is an indirect branch instruction, and processing the indirect branch instruction as a sequence of two-way branches to execute an indirect branch instruction with multiple branch addresses. Indirect branch instructions may be used to allow greater flexibility since the branch address or multiple branch addresses do not need to be determined at compile time.

    摘要翻译: 被配置为管理线程组中的发散线程的计算系统的一个实施例包括配置成存储至少一个令牌和多线程处理单元的堆栈。 多线程处理单元被配置为执行以下步骤:获取程序指令,确定程序指令是间接分支指令,以及将间接分支指令处理为双向分支序列,以执行具有多个分支的间接分支指令 地址 可以使用间接分支指令来允许更大的灵活性,因为在编译时不需要确定分支地址或多个分支地址。

    Register based queuing for texture requests
    10.
    发明授权
    Register based queuing for texture requests 有权
    基于注册排队的纹理请求

    公开(公告)号:US07864185B1

    公开(公告)日:2011-01-04

    申请号:US12256848

    申请日:2008-10-23

    CPC分类号: G06T11/60 G09G5/363

    摘要: A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.

    摘要翻译: 图形处理单元可以排队大量纹理请求,以平衡纹理请求的可变性,而不需要大的纹理请求缓冲区。 专用纹理请求缓冲区排队相对较小的纹理命令和参数。 另外,对于每个排队的纹理命令,通常比纹理命令大得多的一组相关的纹理参数存储在通用寄存器中。 纹理单元从纹理请求缓冲区中检索纹理命令,然后从相应的通用寄存器获取相关的纹理参数。 纹理参数可以存储在指定为由纹理单元计算的最终纹理值的目的地的通用寄存器中。 因为当纹理命令排队时,必须为目标寄存器分配最终纹理值,所以将纹理参数存储在该寄存器中不消耗任何其他寄存器。