Dynamic load balancing of instructions for execution by heterogeneous processing engines
    91.
    发明授权
    Dynamic load balancing of instructions for execution by heterogeneous processing engines 有权
    用于异构处理引擎执行的指令的动态负载平衡

    公开(公告)号:US08578387B1

    公开(公告)日:2013-11-05

    申请号:US11831873

    申请日:2007-07-31

    IPC分类号: G06F9/46

    摘要: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.

    摘要翻译: 计算系统的实施例被配置为使用包括异构处理引擎来执行程序的多线程SIMD架构来处理数据。 该程序由各种程序指令构成。 第一类型的程序指令只能由第一类型的处理引擎执行,并且第三类型的程序指令只能由第二类型的处理引擎执行。 第二类型的程序指令可以由第一类和第二类处理引擎执行。 分配单元可以被配置为动态地确定两个处理引擎中的哪一个执行第二类型的任何程序指令,以便平衡异构处理引擎之间的工作负载。

    PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS
    93.
    发明申请
    PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS 有权
    可编程图形处理程序,用于多方案执行程序

    公开(公告)号:US20120218267A1

    公开(公告)日:2012-08-30

    申请号:US13466043

    申请日:2012-05-07

    IPC分类号: G06T17/20 G06T1/20 G06T1/00

    CPC分类号: G06T15/005

    摘要: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

    摘要翻译: 处理单元包括多个执行流水线,每个执行流水线连接到第一输入部分,用于接收用于像素处理的输入数据和用于接收用于顶点处理的输入数据的第二输入部分和用于存储经处理的像素数据的第一输出部分和 用于存储经处理的顶点数据的第二输出部分。 经处理的顶点数据被光栅化并扫描转换为用作像素处理的输入数据的像素数据。 经处理的像素数据被输出到光栅分析器。

    Subdividing a shader program
    94.
    发明授权
    Subdividing a shader program 有权
    细分着色程序

    公开(公告)号:US08159496B1

    公开(公告)日:2012-04-17

    申请号:US12476137

    申请日:2009-06-01

    摘要: Methods and apparatus for subdividing a shader program into regions or “phases” of instructions identifiable by phase identifiers (IDs) inserted into the shader program are provided. The phase IDs may be used to constrain execution of the shader program to prohibit texture fetches in later phases from being executed before a texture fetch in a current phase has completed. Other operations (e.g., math operations) within the current phase, however, may be allowed to execute while waiting for the current phase texture fetch to complete.

    摘要翻译: 提供了将着色器程序细分为通过插入到着色器程序中的相位标识符(ID)可识别的指令的区域或“阶段”的方法和装置。 相位ID可以用于限制着色器程序的执行,以便在当前阶段的纹理提取完成之前禁止稍后阶段中的纹理提取被执行。 然而,当前阶段的其他操作(例如,数学运算)可以在等待当前相位纹理提取完成的同时执行。

    Two-Level Scheduler for Multi-Threaded Processing
    95.
    发明申请
    Two-Level Scheduler for Multi-Threaded Processing 有权
    用于多线程处理的两级调度器

    公开(公告)号:US20120079503A1

    公开(公告)日:2012-03-29

    申请号:US13151094

    申请日:2011-06-01

    IPC分类号: G06F9/48

    摘要: One embodiment of the present invention sets forth a technique for scheduling thread execution in a multi-threaded processing environment. A two-level scheduler maintains a small set of active threads called strands to hide function unit pipeline latency and local memory access latency. The strands are a sub-set of a larger set of pending threads that is also maintained by the two-leveler scheduler. Pending threads are promoted to strands and strands are demoted to pending threads based on latency characteristics. The two-level scheduler selects strands for execution based on strand state. The longer latency of the pending threads is hidden by selecting strands for execution. When the latency for a pending thread is expired, the pending thread may be promoted to a strand and begin (or resume) execution. When a strand encounters a latency event, the strand may be demoted to a pending thread while the latency is incurred.

    摘要翻译: 本发明的一个实施例提出了一种用于在多线程处理环境中调度线程执行的技术。 一个两级调度程序维护一组称为线索的活动线程,以隐藏功能单元流水线延迟和本地存储器访问延迟。 这些链是一组更大的待处理线程的子集,其也由二级调度器维护。 等待线程被提升为线索,并且基于延迟特性将线降级到等待线程。 两级调度器基于线状态来选择用于执行的线。 通过选择要执行的链来隐藏待处理线程的延迟更长。 当待处理线程的等待时间到期时,挂起的线程可以被提升为一个线并开始(或恢复)执行。 当一条线遇到一个延迟事件时,该链可以被降级到等待线程,同时发生延迟。

    Credit-Based Streaming Multiprocessor Warp Scheduling
    96.
    发明申请
    Credit-Based Streaming Multiprocessor Warp Scheduling 有权
    基于信用流的多处理器整流器调度

    公开(公告)号:US20110072244A1

    公开(公告)日:2011-03-24

    申请号:US12885299

    申请日:2010-09-17

    IPC分类号: G06F9/38 G06F9/312

    摘要: One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.

    摘要翻译: 本发明的一个实施例提出了一种用于确保高速缓存访​​问指令被调度用于在多线程系统中执行以提高高速缓存位置和系统性能的技术。 可以使用基于信用的技术来对组中的每个翘曲的指令调度来控制指令,使得一组经线被均匀地处理。 对每个经纱计算信用额度,并且信用额度有助于每个经线的权重。 权重用于选择要执行的经线的指令。

    Register based queuing for texture requests
    97.
    发明授权
    Register based queuing for texture requests 有权
    基于注册排队的纹理请求

    公开(公告)号:US07864185B1

    公开(公告)日:2011-01-04

    申请号:US12256848

    申请日:2008-10-23

    CPC分类号: G06T11/60 G09G5/363

    摘要: A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.

    摘要翻译: 图形处理单元可以排队大量纹理请求,以平衡纹理请求的可变性,而不需要大的纹理请求缓冲区。 专用纹理请求缓冲区排队相对较小的纹理命令和参数。 另外,对于每个排队的纹理命令,通常比纹理命令大得多的一组相关的纹理参数存储在通用寄存器中。 纹理单元从纹理请求缓冲区中检索纹理命令,然后从相应的通用寄存器获取相关的纹理参数。 纹理参数可以存储在指定为由纹理单元计算的最终纹理值的目的地的通用寄存器中。 因为当纹理命令排队时,必须为目标寄存器分配最终纹理值,所以将纹理参数存储在该寄存器中不消耗任何其他寄存器。

    System, method and article of manufacture for a programmable processing model with instruction set
    100.
    发明授权
    System, method and article of manufacture for a programmable processing model with instruction set 有权
    具有指令集的可编程处理模型的系统,方法和制造

    公开(公告)号:US07697008B1

    公开(公告)日:2010-04-13

    申请号:US11680125

    申请日:2007-02-28

    IPC分类号: G06T1/00

    CPC分类号: G06T15/005 G06T15/503

    摘要: A system, method and article of manufacture are provided for programmable processing in a computer graphics pipeline. Initially, data is received from a source buffer. Thereafter, programmable operations are performed on the data in order to generate output. The operations are programmable in that a user may utilize instructions from a predetermined instruction set for generating the same. Such output is stored in a register. During operation, the output stored in the register is used in performing the programmable operations on the data.

    摘要翻译: 提供了一种用于计算机图形管线中的可编程处理的系统,方法和制造物品。 最初,从源缓冲区接收数据。 此后,对数据执行可编程操作以产生输出。 操作是可编程的,因为用户可以利用来自预定指令集的指令来产生它们。 这样的输出被存储在寄存器中。 在运行期间,存储在寄存器中的输出用于对数据执行可编程操作。