Method and apparatus for multi-thread accumulation buffering in a computation engine
    1.
    发明授权
    Method and apparatus for multi-thread accumulation buffering in a computation engine 失效
    计算引擎中多线程累加缓冲的方法和装置

    公开(公告)号:US07111156B1

    公开(公告)日:2006-09-19

    申请号:US09556473

    申请日:2000-04-21

    IPC分类号: G06F9/00

    摘要: A method and apparatus for enhancing flexibility of instruction ordering in a multi-thread processing system that performs multiply and accumulate operations is presented. A plurality of accumulation registers is provided for storing the results of an adder, wherein each of the plurality of accumulation registers corresponds to a different thread of the plurality of threads. The contents of each of the plurality of accumulation registers can be selected as an input to the adder such that the present accumulated value can be added to a subsequently calculated produce to generate a new accumulated value.

    摘要翻译: 提出了一种用于增强执行乘法和累加操作的多线程处理系统中指令排序灵活性的方法和装置。 提供多个累加寄存器用于存储加法器的结果,其中多个累加寄存器中的每一个对应于多个线程的不同线程。 可以将多个累加寄存器中的每一个的内容选择为加法器的输入,使得当前累积值可以被添加到随后计算的乘积以产生新的累加值。

    Vector engine with pre-accumulation buffer and method therefore
    2.
    发明授权
    Vector engine with pre-accumulation buffer and method therefore 有权
    具有预积累缓冲器和方法的矢量引擎

    公开(公告)号:US06731294B1

    公开(公告)日:2004-05-04

    申请号:US09556472

    申请日:2000-04-21

    IPC分类号: G09G536

    CPC分类号: G06T15/005

    摘要: A method and apparatus for reducing latency in pipelined circuits that process dependent operations is presented. In order to reduce latency for dependent operations, a pre-accumulation register is included in an operation pipeline between a first operation unit and a second operation unit. The pre-accumulation register stores a first result produced by the first operation unit during a first operation. When the first operation unit completes a second operation to produce a second result, the first result stored in the pre-accumulation register is presented to the second operation unit along with the second result as input operands.

    摘要翻译: 提出了一种用于降低处理相关操作的流水线电路中的延迟的方法和装置。 为了减少从属操作的延迟,在第一操作单元和第二操作单元之间的操作流水线中包括预积累寄存器。 预累积寄存器存储在第一操作期间由第一操作单元产生的第一结果。 当第一操作单元完成第二操作以产生第二结果时,存储在预累加寄存器中的第一结果与第二结果一起被呈现给第二操作单元作为输入操作数。

    Method and apparatus for memory latency avoidance in a processing system
    3.
    发明授权
    Method and apparatus for memory latency avoidance in a processing system 有权
    处理系统中的内存延迟回避的方法和装置

    公开(公告)号:US06728869B1

    公开(公告)日:2004-04-27

    申请号:US09556471

    申请日:2000-04-21

    IPC分类号: G06F9312

    摘要: A method and apparatus for avoiding latency in a processing system that includes a memory for storing intermediate results is presented. The processing system stores results produced by an operation unit in memory, where the results may be used by subsequent dependent operations. In order to avoid the latency of the memory, the output for the operation unit may be routed directly back into the operation unit as a subsequent operand. Furthermore, one or more memory bypass registers are included such that the results produced by the operation unit during recent operations that have not yet satisfied the latency requirements of the memory are also available. A first memory bypass register may thus provide the result of an operation that completed one cycle earlier, a second memory bypass register may provide the result of an operation that completed two cycles earlier, etc.

    摘要翻译: 提出了一种用于在包括用于存储中间结果的存储器的处理系统中避免等待时间的方法和装置。 处理系统将由操作单元产生的结果存储在存储器中,其中结果可以由随后的依赖操作使用。 为了避免存储器的等待时间,操作单元的输出可以作为后续操作数直接返回到操作单元中。 此外,包括一个或多个存储器旁路寄存器,使得操作单元在最近操作期间产生的尚未满足存储器的等待时间要求的结果也是可用的。 因此,第一存储器旁路寄存器可以提供更早完成一个周期的操作的结果,第二存储器旁路寄存器可以提供早于两个周期完成的操作的结果等。

    Lighting effect computation circuit and method therefore
    4.
    发明授权
    Lighting effect computation circuit and method therefore 有权
    照明效果计算电路和方法因此

    公开(公告)号:US06567084B1

    公开(公告)日:2003-05-20

    申请号:US09626657

    申请日:2000-07-27

    IPC分类号: G06T1560

    CPC分类号: G06T15/506 G06T15/005

    摘要: A lighting effect computation block and method therefore is presented. The lighting effect computation block separates lighting effect calculations for video graphics primitives into a number of simpler calculations that are performed in parallel but accumulated in an order-dependent manner. Each of the individual calculations is managed by a separate thread controller, where lighting effect calculations for a vertex of a primitive may be performed using a single parent light thread controller and a number of sub-light thread controllers. Each thread controller manages a thread of operation codes related to determination of the lighting parameters for the particular vertex. The thread controllers submit operation codes to an arbitration module based on the expected latency and interdependency between the various operation codes. The arbitration module determines which operation code is executed during a particular cycle, and provides that operation code to a computation engine. The computation engine performs calculations based on the operation code and stores results either in a memory or in an accumulation buffer corresponding to the particular vertex lighting effect block. In order to ensure that the order-dependent operations are properly performed, each of the sub-light thread controllers determines whether or not the accumulation operations for the preceding threads have been initiated before it submits its own final operation code that results in the performance of a subsequent accumulation operation.

    摘要翻译: 因此提出了一种照明效果计算块和方法。 照明效果计算块将视频图形原语的照明效果计算分解成并行执行但以依次顺序累积的多个简单计算。 每个单独的计算由单独的线程控制器管理,其中可以使用单个父光线控制器和多个子光线程控制器来执行对基元顶点的照明效果计算。 每个线程控制器管理与确定特定顶点的照明参数相关的操作码线程。 线程控制器根据各种操作代码之间的预期等待时间和相互依赖关系,向仲裁模块提交操作代码。 仲裁模块确定在特定周期内执行哪个操作代码,并将该操作代码提供给计算引擎。 计算引擎根据操作代码执行计算,并将结果存储在对应于特定顶点照明效果块的存储器或累加缓冲器中。 为了确保顺序执行顺序相关的操作,每个子光线程控制器确定先前线程的累加操作是否在提交其自己的最终操作代码之前已经被启动,从而导致执行 随后的积累操作。

    Geometric engine including a computational module without memory contention
    5.
    发明授权
    Geometric engine including a computational module without memory contention 有权
    几何引擎包括没有内存争用的计算模块

    公开(公告)号:US06675285B1

    公开(公告)日:2004-01-06

    申请号:US09556470

    申请日:2000-04-21

    IPC分类号: G06F1516

    摘要: A method and apparatus for eliminating memory contention in a computation module is presented. The method includes, for a current operation being performed by a computation engine of the computation model, processing that begins by identifying one of a plurality of threads for which the current operation is being performed. The plurality of threads constitutes an application (e.g., geometric primitive applications, video graphic applications, drawing applications, etc.). The processing continues by identifying an operation code from a set of operation codes corresponding to the one of the plurality of threads. As such, the thread that has been identified for the current operation, one of its operation codes is being identified for the current operation. The processing then continues by determining a particular location of a particular one of a plurality of data flow memory devices based on the particular thread and the particular operation code for storing the result of the current operation. The processing then continues by producing a result for the current operation and storing the result at the particular location of the particular one of the data flow memory devices.

    摘要翻译: 提出了一种消除计算模块中的内存争用的方法和装置。 该方法包括对于由计算模型的计算引擎执行的当前操作,通过识别正在执行当前操作的多个线程之一开始的处理。 多个线程构成应用(例如,几何原始应用,视频图形应用,绘图应用等)。 通过从对应于多个线程中的一个线程的一组操作代码识别操作代码来继续该处理。 因此,针对当前操作已被识别的线程,其当前操作中正在识别其操作码之一。 然后通过基于特定线程和用于存储当前操作的结果的特定操作码确定多个数据流存储器件中的特定位置的特定位置来继续处理。 然后通过产生用于当前操作的结果并将结果存储在特定数据流存储器件的特定位置来继续处理。

    Method and apparatus for shared microcode in a multi-thread computation engine
    6.
    发明授权
    Method and apparatus for shared microcode in a multi-thread computation engine 有权
    多线程计算引擎中共享微码的方法和装置

    公开(公告)号:US06624818B1

    公开(公告)日:2003-09-23

    申请号:US09556485

    申请日:2000-04-21

    IPC分类号: G06T1500

    摘要: A method and apparatus for supporting shared microcode in a multi-thread computation engine is presented. Each of a plurality of thread controllers controls a thread of a plurality of threads that are included in the system. Rather than storing the operation codes associated with their respective threads and providing those operation codes to an arbitration module for execution, each of the thread controller stores operation code identifiers that are submitted to the arbitration module. Once the arbitration module has determine which operation code should be executed, it passes the operation code identifiers corresponding to that operation code to a microcode generation block. The microcode generation block uses the operation code identifiers to generate a set of input parameters that are provided to a computation engine for execution, where the input parameters correspond to those for the operation code encoded by the operation code identifiers received by the microcode generation block.

    摘要翻译: 提出了一种用于在多线程计算引擎中支持共享微代码的方法和装置。 多个线程控制器中的每一个控制包括在系统中的多个线程的线程。 不是存储与它们各自的线程相关联的操作代码,并且将这些操作代码提供给仲裁模块以执行,而是每个线程控制器存储提交给仲裁模块的操作代码标识符。 一旦仲裁模块确定应该执行哪个操作代码,它将与该操作代码相对应的操作代码标识传递给微代码生成块。 微代码生成块使用操作代码标识符来生成提供给用于执行的计算引擎的一组输入参数,其中输入参数对应于由微代码生成块接收的操作代码标识符编码的操作代码。

    Configurable vertex blending circuit and method therefore
    7.
    发明授权
    Configurable vertex blending circuit and method therefore 有权
    因此,可配置的顶点混合电路和方法

    公开(公告)号:US06552733B1

    公开(公告)日:2003-04-22

    申请号:US09552931

    申请日:2000-04-20

    IPC分类号: G09G539

    CPC分类号: G06T15/00 G06T11/203

    摘要: A configurable vertex blending circuit that allows both morphing and skinning operations to be supported in dedicated hardware is presented. Such a configurable vertex blending circuit includes a matrix array that is used for storing the matrices associated with the various portions of the vertex blending operations. Vertex data that is received is stored in an input vertex buffer that includes multiple position buffers such that the multiple positions associated with morphing operations can be stored. Similarly, the single position typically associated with skinning operations can be stored in one of the position buffers. The input vertex buffer further stores blending weights associated with the various component operations that are included in the overall vertex blending operation. An arithmetic unit, which is configured and controlled by a transform controller, performs the calculations required for each of a plurality of component operations included in the overall vertex blending operation. The results of each of these component operations are then combined to produce a blended vertex.

    摘要翻译: 提出了一种允许在专用硬件中支持变形和外观操作的可配置顶点混合电路。 这种可配置顶点混合电路包括用于存储与顶点混合操作的各个部分相关联的矩阵的矩阵阵列。 接收的顶点数据被存储在包括多个位置缓冲器的输入顶点缓冲器中,使得可以存储与变形操作相关联的多个位置。 类似地,通常与剥皮操作相关联的单个位置可以存储在位置缓冲器之一中。 输入顶点缓冲器还存储与包括在总体顶点混合操作中的各种组件操作相关联的混合权重。 由变换控制器配置和控制的算术单元执行包括在整体顶点混合操作中的多个分量操作中的每一个所需的计算。 然后将这些组件操作中的每一个的结果组合以产生混合顶点。

    Method and apparatus for arbitrating access to a computational engine for use in a video graphics controller
    8.
    发明授权
    Method and apparatus for arbitrating access to a computational engine for use in a video graphics controller 有权
    用于仲裁访问计算引擎以用于视频图形控制器的方法和装置

    公开(公告)号:US06640299B1

    公开(公告)日:2003-10-28

    申请号:US09556475

    申请日:2000-04-21

    IPC分类号: G06F944

    摘要: A method and apparatus for arbitrating access to a computation engine includes processing that begins by determining, for a given clock cycle of the computation engine, whether at least one operation code is pending. When at least one operation code is pending, the processing continues by providing the operation code to the computation engine. When multiple operation codes are pending for the given clock cycle, the processing determines a priority operation code from the multiple pending operation codes based on an application specific prioritization scheme. The application specific prioritization scheme is dependent on the application and may include a two level prioritization scheme. At the first level the prioritization scheme prioritizes certain threads over other threads such that the throughput through the computation module is maximized. At the second level, the prioritization scheme prioritizes operation codes within a set of threads of equal priority based on the length of time the data for the operation codes has been in the processing pipeline. The processing then continues by shifting the remaining operation codes of the multiple operation codes to a subsequent clock cycle of the computation engine.

    摘要翻译: 用于仲裁对计算引擎的访问的方法和装置包括开始于对于计算引擎的给定时钟周期是否至少一个操作代码正在等待的开始的处理。 当至少一个操作代码正在等待时,通过向计算引擎提供操作代码来继续该处理。 当多个操作代码在给定的时钟周期中等待时,该处理根据应用特定的优先化方案从多个等待操作代码确定优先操作代码。 应用特定优先级方案取决于应用,并且可以包括两级优先级方案。 在第一级,优先级排序方案将某些线程优先于其他线程,从而使通过计算模块的吞吐量最大化。 在第二级,基于操作码的数据已经处于处理流水线的时间长度,优先级排列方案优先考虑一组相同优先级的线程内的操作码。 然后通过将多个操作代码的剩余操作代码移动到计算引擎的后续时钟周期来继续处理。

    Geometric engine including a computational module for use in a video graphics controller
    9.
    发明授权
    Geometric engine including a computational module for use in a video graphics controller 有权
    几何引擎包括用于视频图形控制器的计算模块

    公开(公告)号:US06630935B1

    公开(公告)日:2003-10-07

    申请号:US09556474

    申请日:2000-04-21

    IPC分类号: G06T1500

    摘要: A computation module and/or geometric engine for use in a video graphics processing circuit includes memory, a computation engine, a plurality of thread controllers, and an arbitration module. The computation engine is operably coupled to perform an operation based on an operation code and to provide a corresponding result to the memory as indicated by the operation code. Each of the plurality of thread controllers manages at least one corresponding thread of a plurality of threads. The plurality of threads constitutes an application. The arbitration module is coupled to the plurality of thread controllers and utilizes an application specific prioritization scheme to provide operation codes from the plurality of thread controllers to the computation engine such that idle time of the computation engine is minimized. The prioritization scheme prioritizes certain threads over other threads such that the throughput through the computation module is maximized.

    摘要翻译: 用于视频图形处理电路的计算模块和/或几何引擎包括存储器,计算引擎,多个线程控制器和仲裁模块。 计算引擎可操作地耦合以执行基于操作代码的操作,并且向操作代码所指示的存储器提供相应的结果。 多个线程控制器中的每一个管理多个线程的至少一个对应的线程。 多个线程构成应用。 所述仲裁模块耦合到所述多个线程控制器,并且利用应用专用优先级方案来提供从所述多个线程控制器到所述计算引擎的操作代码,使得所述计算引擎的空闲时间被最小化。 优先排序方案将某些线程优先于其他线程,使得通过计算模块的吞吐量最大化。