Dynamic load balancing of instructions for execution by heterogeneous processing engines
    1.
    发明授权
    Dynamic load balancing of instructions for execution by heterogeneous processing engines 有权
    用于异构处理引擎执行的指令的动态负载平衡

    公开(公告)号:US08578387B1

    公开(公告)日:2013-11-05

    申请号:US11831873

    申请日:2007-07-31

    IPC分类号: G06F9/46

    摘要: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.

    摘要翻译: 计算系统的实施例被配置为使用包括异构处理引擎来执行程序的多线程SIMD架构来处理数据。 该程序由各种程序指令构成。 第一类型的程序指令只能由第一类型的处理引擎执行,并且第三类型的程序指令只能由第二类型的处理引擎执行。 第二类型的程序指令可以由第一类和第二类处理引擎执行。 分配单元可以被配置为动态地确定两个处理引擎中的哪一个执行第二类型的任何程序指令,以便平衡异构处理引擎之间的工作负载。

    Operand collector architecture
    2.
    发明授权
    Operand collector architecture 有权
    操作数收集架构

    公开(公告)号:US07834881B2

    公开(公告)日:2010-11-16

    申请号:US11555649

    申请日:2006-11-01

    IPC分类号: G09G5/36 G09G5/39 G06F15/80

    摘要: An apparatus and method for simulating a multi-ported memory using lower port count memories as banks. A collector units gather source operands from the banks as needed to process program instructions. The collector units also gather constants that are used as operands. When all of the source operands needed to process a program instruction have been gathered, a collector unit outputs the source operands to an execution unit while avoiding writeback conflicts to registers specified by the program instruction that may be accessed by other execution units.

    摘要翻译: 一种使用较低端口计数存储器作为存储体来模拟多端口存储器的装置和方法。 收集器单元根据需要从银行收集源操作数,以处理程序指令。 收集器单元还收集用作操作数的常量。 当收集处理程序指令所需的所有源操作数时,收集器单元将源操作数输出到执行单元,同时避免与由其他执行单元访问的程序指令指定的寄存器的写回冲突。

    OPERAND COLLECTOR ARCHITECTURE
    3.
    发明申请
    OPERAND COLLECTOR ARCHITECTURE 有权
    操作收集架构

    公开(公告)号:US20080109611A1

    公开(公告)日:2008-05-08

    申请号:US11555649

    申请日:2006-11-01

    IPC分类号: G06F13/00

    摘要: An apparatus and method for simulating a multi-ported memory using lower port count memories as banks. A collector units gather source operands from the banks as needed to process program instructions. The collector units also gather constants that are used as operands. When all of the source operands needed to process a program instruction have been gathered, a collector unit outputs the source operands to an execution unit while avoiding writeback conflicts to registers specified by the program instruction that may be accessed by other execution units.

    摘要翻译: 一种使用较低端口计数存储器作为存储体来模拟多端口存储器的装置和方法。 收集器单元根据需要从银行收集源操作数,以处理程序指令。 收集器单元还收集用作操作数的常量。 当收集处理程序指令所需的所有源操作数时,收集器单元将源操作数输出到执行单元,同时避免与由其他执行单元访问的程序指令指定的寄存器的写回冲突。

    Using a pixel offset for evaluating a plane equation
    4.
    发明授权
    Using a pixel offset for evaluating a plane equation 有权
    使用像素偏移来评估平面方程

    公开(公告)号:US09058672B2

    公开(公告)日:2015-06-16

    申请号:US12898537

    申请日:2010-10-05

    IPC分类号: G06K9/32 G06T3/40

    CPC分类号: G06T3/4007

    摘要: One embodiment of the present invention sets forth a technique controlling the pixel location at which the plane equation is evaluated. Multiple pixel offsets (dx, dy) may be specified that each define to a sub-pixel sample position. Attributes are then calculated for each sub-pixel sample position that is covered by a geometric primitive. One advantage of the technique is that anti-aliasing quality may be improved since high frequency color components may be selectively supersampled for particular geometric primitives.

    摘要翻译: 本发明的一个实施例提出了一种控制平面方程被评估的像素位置的技术。 可以指定多个像素偏移(dx,dy),每个像素偏移定义为子像素采样位置。 然后对由几何图元覆盖的每个子像素样本位置计算属性。 该技术的一个优点是可以改善抗混叠质量,因为可以对特定几何基元选择性地超采样高频彩色分量。

    Reordering operands assigned to each one of read request ports concurrently accessing multibank register file to avoid bank conflict
    5.
    发明授权
    Reordering operands assigned to each one of read request ports concurrently accessing multibank register file to avoid bank conflict 有权
    对分配给每个读取请求端口的操作数重新排序并发访问多银行寄存器文件以避免银行冲突

    公开(公告)号:US08533435B2

    公开(公告)日:2013-09-10

    申请号:US12875843

    申请日:2010-09-03

    IPC分类号: G06F9/34

    摘要: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.

    摘要翻译: 本发明的一个实施例提出了一种用于收集由指令指定的操作数的技术。 由于接收到指令序列,指令指定的操作数被分配给端口,以便将由单个指令指定的每个操作数分配给不同的端口。 通过从不同端口中的每一个选择一个操作数来调度来自多存储器寄存器文件的操作数,以产生操作数读取请求,并确保所选择的操作数中的两个或更多个不存储在多个存储区的同一个存储区中 银行寄存器文件。 由操作数读取请求指定的操作数在单个时钟周期内从多存储体寄存器文件读取。 然后由指令指定的操作数从多存储寄存器文件中读取并在一个或多个时钟周期内采集,执行每条指令。

    Unified Collector Structure for Multi-Bank Register File
    6.
    发明申请
    Unified Collector Structure for Multi-Bank Register File 有权
    多银行登记册统一采集器结构

    公开(公告)号:US20110072243A1

    公开(公告)日:2011-03-24

    申请号:US12875843

    申请日:2010-09-03

    IPC分类号: G06F9/30

    摘要: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.

    摘要翻译: 本发明的一个实施例提出了一种用于收集由指令指定的操作数的技术。 由于接收到指令序列,指令指定的操作数被分配给端口,以便将由单个指令指定的每个操作数分配给不同的端口。 通过从不同端口中的每一个选择一个操作数来调度来自多存储器寄存器文件的操作数,以产生操作数读取请求,并确保所选择的操作数中的两个或更多个不存储在多个存储区的同一个存储区中 银行寄存器文件。 由操作数读取请求指定的操作数在单个时钟周期内从多存储体寄存器文件读取。 然后由指令指定的操作数从多存储寄存器文件中读取并在一个或多个时钟周期内采集,执行每条指令。

    Credit-based streaming multiprocessor warp scheduling
    7.
    发明授权
    Credit-based streaming multiprocessor warp scheduling 有权
    基于信用流的多处理器扭曲调度

    公开(公告)号:US09189242B2

    公开(公告)日:2015-11-17

    申请号:US12885299

    申请日:2010-09-17

    IPC分类号: G06F9/50 G06F9/38

    摘要: One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.

    摘要翻译: 本发明的一个实施例提出了一种用于确保高速缓存访​​问指令被调度用于在多线程系统中执行以提高高速缓存位置和系统性能的技术。 可以使用基于信用的技术来对组中的每个翘曲的指令调度来控制指令,使得一组经线被均匀地处理。 对每个经纱计算信用额度,并且信用额度有助于每个经线的权重。 权重用于选择要执行的经纱的说明。

    Programmable graphics processor for multithreaded execution of programs
    8.
    发明授权
    Programmable graphics processor for multithreaded execution of programs 有权
    用于多线程执行程序的可编程图形处理器

    公开(公告)号:US08405665B2

    公开(公告)日:2013-03-26

    申请号:US13466043

    申请日:2012-05-07

    CPC分类号: G06T15/005

    摘要: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

    摘要翻译: 处理单元包括多个执行流水线,每个执行流水线连接到第一输入部分,用于接收用于像素处理的输入数据和用于接收用于顶点处理的输入数据的第二输入部分和用于存储经处理的像素数据的第一输出部分和 用于存储经处理的顶点数据的第二输出部分。 经处理的顶点数据被光栅化并扫描转换为用作像素处理的输入数据的像素数据。 经处理的像素数据被输出到光栅分析器。

    USING A PIXEL OFFSET FOR EVALUATING A PLANE EQUATION
    9.
    发明申请
    USING A PIXEL OFFSET FOR EVALUATING A PLANE EQUATION 有权
    使用像素偏移来评估平面公式

    公开(公告)号:US20110081100A1

    公开(公告)日:2011-04-07

    申请号:US12898537

    申请日:2010-10-05

    IPC分类号: G06K9/32

    CPC分类号: G06T3/4007

    摘要: One embodiment of the present invention sets forth a technique controlling the pixel location at which the plane equation is evaluated. Multiple pixel offsets (dx, dy) may be specified that each define to a sub-pixel sample position. Attributes are then calculated for each sub-pixel sample position that is covered by a geometric primitive. One advantage of the technique is that anti-aliasing quality may be improved since high frequency color components may be selectively supersampled for particular geometric primitives.

    摘要翻译: 本发明的一个实施例提出了一种控制平面方程被评估的像素位置的技术。 可以指定多个像素偏移(dx,dy),每个像素偏移定义为子像素采样位置。 然后对由几何图元覆盖的每个子像素样本位置计算属性。 该技术的一个优点是可以改善抗混叠质量,因为可以对特定几何基元选择性地超采样高频彩色分量。