Method and apparatus for multithreaded processing of data in a programmable graphics processor
    21.
    发明授权
    Method and apparatus for multithreaded processing of data in a programmable graphics processor 有权
    用于可编程图形处理器中数据的多线程处理的方法和装置

    公开(公告)号:US07015913B1

    公开(公告)日:2006-03-21

    申请号:US10608346

    申请日:2003-06-27

    摘要: A graphics processor and method for executing a graphics program as a plurality of threads where each sample to be processed by the program is assigned to a thread. Although threads share processing resources within the programmable graphics processor, the execution of each thread can proceed independent of any other threads. For example, instructions in a second thread are scheduled for execution while execution of instructions in a first thread are stalled waiting for source data. Consequently, a first received sample (assigned to the first thread) may be processed after a second received sample (assigned to the second thread). A benefit of independently executing each thread is improved performance because a stalled thread does not prevent the execution of other threads.

    摘要翻译: 一种用于执行图形程序作为多个线程的图形处理器和方法,其中由程序处理的每个样本被分配给线程。 虽然线程在可编程图形处理器内共享处理资源,但每个线程的执行可以独立于任何其他线程进行。 例如,第二线程中的指令被调度为执行,而第一线程中的指令的执行被停止等待源数据。 因此,可以在第二个接收到的样本(分配给第二个线程)之后处理第一个接收到的样本(分配给第一个线程)。 独立执行每个线程的好处是提高了性能,因为停滞的线程不会阻止其他线程的执行。

    Method, apparatus and article of manufacture for a vertex attribute buffer in a graphics processor
    25.
    发明授权
    Method, apparatus and article of manufacture for a vertex attribute buffer in a graphics processor 有权
    用于图形处理器中的顶点属性缓冲器的方法,装置和制造

    公开(公告)号:US06515671B1

    公开(公告)日:2003-02-04

    申请号:US09454525

    申请日:1999-12-06

    IPC分类号: G06T120

    CPC分类号: G06T1/60 G06T15/005

    摘要: A method, apparatus and article of manufacture are provided for managing vertex data in a vertex buffer. First, vertex data is received and stored in the vertex buffer. Thereafter, the vertex data is outputted from the vertex buffer to a processing module. During operation, a plurality of command bits is passed from the vertex buffer for determining a manner in which the vertex data is inputted and processed in the input buffer of the processing module. Such command bits are received from a command bit source. Further, a plurality of mode bits indicative of a status of a plurality of modes of process operations is passed. Such mode bits are received from a mode bit source. The mode bits are adapted for determining a manner in which the vertex data is processed in the processing module.

    摘要翻译: 提供了一种用于管理顶点缓冲器中的顶点数据的方法,装置和制品。 首先,顶点数据被接收并存储在顶点缓冲器中。 此后,顶点数据从顶点缓冲器输出到处理模块。 在操作期间,从顶点缓冲器传送多个命令位,以确定在处理模块的输入缓冲器中输入和处理顶点数据的方式。 这样的命令位从命令位源接收。 此外,通过表示多种处理操作模式的状态的多个模式比特。 从模式位源接收这样的模式位。 模式位适于确定在处理模块中处理顶点数据的方式。

    Dispatching of instructions for execution by heterogeneous processing engines
    27.
    发明授权
    Dispatching of instructions for execution by heterogeneous processing engines 有权
    调度由异构处理引擎执行的指令

    公开(公告)号:US09304775B1

    公开(公告)日:2016-04-05

    申请号:US11935266

    申请日:2007-11-05

    IPC分类号: G06F9/38

    摘要: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a second type of program instructions can only be executed by a second type of processing engine. A third type of program instructions can be executed by the first and the second type of processing engines. An instruction dispatcher is configured to identify and remove program instruction execution conflicts for the heterogeneous processing engines to improve instruction execution throughput.

    摘要翻译: 计算系统的实施例被配置为使用包括异构处理引擎来执行程序的多线程SIMD架构来处理数据。 该程序由各种程序指令构成。 第一类型的程序指令只能由第一类型的处理引擎执行,并且第二类型的程序指令只能由第二类型的处理引擎执行。 第三种类型的程序指令可以由第一类和第二类处理引擎执行。 指令调度器被配置为识别和去除异构处理引擎的程序指令执行冲突,以改善指令执行吞吐量。

    Credit-based streaming multiprocessor warp scheduling
    28.
    发明授权
    Credit-based streaming multiprocessor warp scheduling 有权
    基于信用流的多处理器扭曲调度

    公开(公告)号:US09189242B2

    公开(公告)日:2015-11-17

    申请号:US12885299

    申请日:2010-09-17

    IPC分类号: G06F9/50 G06F9/38

    摘要: One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.

    摘要翻译: 本发明的一个实施例提出了一种用于确保高速缓存访​​问指令被调度用于在多线程系统中执行以提高高速缓存位置和系统性能的技术。 可以使用基于信用的技术来对组中的每个翘曲的指令调度来控制指令,使得一组经线被均匀地处理。 对每个经纱计算信用额度,并且信用额度有助于每个经线的权重。 权重用于选择要执行的经纱的说明。

    Parallel array architecture for a graphics processor
    30.
    发明授权
    Parallel array architecture for a graphics processor 有权
    用于图形处理器的并行阵列架构

    公开(公告)号:US08730249B2

    公开(公告)日:2014-05-20

    申请号:US13269462

    申请日:2011-10-07

    摘要: A parallel array architecture for a graphics processor includes a multithreaded core array including a plurality of processing clusters, each processing cluster including at least one processing core operable to execute a pixel shader program that generates pixel data from coverage data; a rasterizer configured to generate coverage data for each of a plurality of pixels; and pixel distribution logic configured to deliver the coverage data from the rasterizer to one of the processing clusters in the multithreaded core array. A crossbar coupled to each of the processing clusters is configured to deliver pixel data from the processing clusters to a frame buffer having a plurality of partitions.

    摘要翻译: 用于图形处理器的并行阵列架构包括包括多个处理簇的多线程核心阵列,每个处理簇包括至少一个可操作以执行从覆盖数据生成像素数据的像素着色器程序的处理核心; 光栅化器,被配置为生成多个像素中的每一个的覆盖数据; 以及像素分布逻辑,被配置为将覆盖数据从光栅化器传送到多线程核心阵列中的处理集群之一。 耦合到每个处理集群的交叉开关被配置为将像素数据从处理集群传送到具有多个分区的帧缓冲器。