Anisotropic Texture Filtering with Texture Data Prefetching
    31.
    发明申请
    Anisotropic Texture Filtering with Texture Data Prefetching 有权
    各向异性纹理过滤与纹理数据预取

    公开(公告)号:US20090315908A1

    公开(公告)日:2009-12-24

    申请号:US12110045

    申请日:2008-04-25

    CPC classification number: G06T15/04 G06T2200/12

    Abstract: A circuit arrangement and method utilize texture data prefetching to prefetch texture data used by an anisotropic filtering algorithm. In particular, stride-based prefetching may be used to prefetch texture data for use in anisotropic filtering, where the value of the stride, or difference between successive accesses, is based upon a distance in a memory address space between sample points taken along the line of anisotropy used in an anisotropic filtering algorithm.

    Abstract translation: 电路布置和方法利用纹理数据预取来预取由各向异性滤波算法使用的纹理数据。 特别地,可以使用基于步幅的预取来预取用于各向异性过滤中的纹理数据,其中步幅的值或连续访问之间的差是基于沿着线所取的采样点之间的存储器地址空间中的距离 在各向异性过滤算法中使用各向异性。

    Reciprocal estimate computation methods and apparatus
    32.
    发明授权
    Reciprocal estimate computation methods and apparatus 有权
    互惠估算计算方法和装置

    公开(公告)号:US07634527B2

    公开(公告)日:2009-12-15

    申请号:US11282032

    申请日:2005-11-17

    Abstract: In a first aspect, a first method of reciprocal estimate computation using floating point pipeline logic is provided. The first method includes the steps of (1) receiving an input value having an exponent and a mantissa when represented as a floating point number on which a reciprocal estimate computation is to be performed; (2) determining whether the exponent is one of a plurality of predetermined numbers; and (3) if the exponent is one of the plurality of predetermined numbers, adjusting at least one of a plurality of modified mantissa bits (e.g., mantissa bits internal to leading zero anticipator (LZA) logic) and the exponent so as to prevent an underflow result of the reciprocal estimate computation. Numerous other aspects are provided.

    Abstract translation: 在第一方面,提供了使用浮点流水线逻辑的互易估计计算的第一种方法。 第一种方法包括以下步骤:(1)当表示为要进行相互估计计算的浮点数时,接收具有指数和尾数的输入值; (2)确定指数是否是多个预定数字之一; 和(3)如果指数是多个预定数目中的一个,则调整多个经修改的尾数位(例如,前导零预测器(LZA)逻辑内部的尾数位)和指数中的至少一个,以防止 倒数估计计算的下溢结果。 提供了许多其他方面。

    Early Exit Processing of Iterative Refinement Algorithm Using Register Dependency Disable
    33.
    发明申请
    Early Exit Processing of Iterative Refinement Algorithm Using Register Dependency Disable 失效
    使用寄存器依赖关系禁用的迭代优化算法的早期退出处理

    公开(公告)号:US20090228690A1

    公开(公告)日:2009-09-10

    申请号:US12045313

    申请日:2008-03-10

    Abstract: An “early exit” of an iterative refinement algorithm is implemented by effectively disabling read after write dependency stalls of newer instructions, as well as disabling the register write enable of these instructions, for the remainder of the algorithm, in addition to disabling the register write enable of these instructions. By doing so, the latency of the algorithm is reduced and the performance is increased without the complexity and potential poor performance of compare and branch instructions that might otherwise be required.

    Abstract translation: 迭代细化算法的“提前退出”除了禁用寄存器写入之外,还通过有效禁用更新指令的写依赖性停止之后的读取以及禁止这些指令的寄存器写使能,对于算法的其余部分 启用这些指令。 通过这样做,降低了算法的等待时间,并且性能得到提高,而没有另外需要的比较和分支指令的复杂性和潜在的差的性能。

    Early Exit Processing of Iterative Refinement Algorithm Using Register Dependency Disable and Programmable Early Exit Condition
    34.
    发明申请
    Early Exit Processing of Iterative Refinement Algorithm Using Register Dependency Disable and Programmable Early Exit Condition 失效
    使用寄存器依赖关闭和可编程提前退出条件的迭代优化算法的早期退出处理

    公开(公告)号:US20090228689A1

    公开(公告)日:2009-09-10

    申请号:US12045243

    申请日:2008-03-10

    Abstract: A programmable “early exit” of an iterative refinement algorithm is implemented by effectively disabling read after write dependency stalls of newer instructions, as well as disabling the register write enable of these instructions, for the remainder of the algorithm, in addition to disabling the register write enable of these instructions. In addition, programmable logic is provided to enable a custom early exit condition to be specified for the iterative refinement algorithm so that the underlying hardware can be configured for optimal execution of particular iterative refinement algorithms. By doing so, the latency of the algorithm is reduced and the performance is increased without the complexity and potential poor performance of compare and branch instructions that might otherwise be required.

    Abstract translation: 迭代细化算法的可编程的“提前退出”是通过有效禁用更新指令的写依赖性停止之后的读取,以及禁用这些指令的其余部分的这些指令的寄存器写使能,除了禁止寄存器 写这些指令的使能。 此外,提供可编程逻辑以使得能够为迭代细化算法指定定制的早期退出条件,使得可以配置底层硬件以优化特定迭代细化算法的执行。 通过这样做,降低了算法的等待时间,并且性能得到提高,而没有另外需要的比较和分支指令的复杂性和潜在的差的性能。

    Processing Unit Incorporating Multirate Execution Unit
    35.
    发明申请
    Processing Unit Incorporating Multirate Execution Unit 失效
    加工单元并入多速率执行单元

    公开(公告)号:US20090182987A1

    公开(公告)日:2009-07-16

    申请号:US11972746

    申请日:2008-01-11

    Abstract: A multirate execution unit is capable of being operated in a plurality of modes, with the execution unit being capable of clocked at multiple different rates relative to a multithreaded issue unit such that, in applications where maximum performance is desired, the execution unit can be clocked at a rate that is faster than the clock rate for the multithreaded issue unit, and in applications where a lower power profile is desired, the execution unit can be throttled back to a slower rate to reduce the power consumption of the execution unit. When the execution unit is clocked at a faster rate than the multithreaded issue unit, the issue unit is permitted to issue more instructions per cycle than when the execution unit is throttled to the slower rate to increase overall instruction throughput.

    Abstract translation: 多速率执行单元能够以多种模式操作,其中执行单元能够以相对于多线程发布单元的多个不同速率进行计时,使得在需要最大性能的应用中,执行单元可被计时 以比多线程发布单元的时钟速率快的速率,以及在需要较低功率配置的应用中,执行单元可以被限制回到较慢的速率以降低执行单元的功耗。 当执行单元以比多线程发布单元更快的速度进行计时时,允许发布单元每循环发出更多指令,而不是执行单元被限制到较慢的速率以增加总体指令吞吐量。

    Full Vector Width Cross Product Using Recirculation for Area Optimization
    36.
    发明申请
    Full Vector Width Cross Product Using Recirculation for Area Optimization 审中-公开
    全矢量宽度交叉产品使用再循环进行区域优化

    公开(公告)号:US20090063608A1

    公开(公告)日:2009-03-05

    申请号:US11849495

    申请日:2007-09-04

    Abstract: Embodiments of the invention are generally related to the field of image processing, and more specifically to vector units for supporting image processing. A vector unit may comprise a plurality of operand multiplexers associated with each vector processing lane of the vector unit. The operand multiplexers may select vector operands from one or more register files for performing a cross product operation. A first multiply operation may be performed in a first pipeline stage by multiplying a first set of operands in a multiplier. In a second pipeline stage, a second multiply operation may be performed by multiplying a second set of operands. The results of the first multiply operation and the second multiply operation may be transferred to an adder to complete the cross product instruction.

    Abstract translation: 本发明的实施例通常涉及图像处理领域,更具体地涉及用于支持图像处理的矢量单元。 矢量单元可以包括与矢量单元的每个矢量处理通道相关联的多个操作数复用器。 操作数复用器可以从一个或多个寄存器文件中选择矢量操作数,以执行交叉产品操作。 可以在第一流水线级中通过乘法乘法器中的第一组操作数来执行第一乘法运算。 在第二流水线级中,可以通过乘以第二组操作数来执行第二乘法运算。 可以将第一乘法运算和第二乘法运算的结果传送到加法器以完成交叉乘积指令。

    Operand Multiplexor Control Modifier Instruction in a Fine Grain Multithreaded Vector Microprocessor
    37.
    发明申请
    Operand Multiplexor Control Modifier Instruction in a Fine Grain Multithreaded Vector Microprocessor 审中-公开
    精细多线程向量微处理器中的操作数多路复用器控制修改器指令

    公开(公告)号:US20080126745A1

    公开(公告)日:2008-05-29

    申请号:US11925278

    申请日:2007-10-26

    CPC classification number: G06T1/20

    Abstract: The present invention is generally related to integrated circuit devices, and more particularly, to methods, systems and design structures for the field of image processing, and more specifically to an instruction set for processing images. Vector processing may involve rearranging vector operands in one or more source registers prior to performing vector operations. Typically, rearranging of operands in source registers is done by issuing a plurality of permute instructions that require excessive usage of temporary registers. Furthermore, the permute instructions may cause dependencies between instructions executing in a pipeline, thereby adversely affecting performance. Embodiments of the invention provide a level of muxing between a register file and a vector unit that allow for rearrangement of vector operands in source registers prior to providing the operands to the vector unit, thereby obviating the need for permute instructions.

    Abstract translation: 本发明通常涉及集成电路装置,更具体地涉及图像处理领域的方法,系统和设计结构,更具体地涉及用于处理图像的指令集。 矢量处理可以包括在执行向量操作之前在一个或多个源寄存器中重新排列向量操作数。 通常,通过发出需要临时寄存器过度使用的多个置换指令来完成源寄存器中操作数的重新排列。 此外,置换指令可能导致在流水线中执行的指令之间的相关性,从而不利地影响性能。 本发明的实施例提供了一种在寄存器文件和向量单元之间的复用水平,其允许在将操作数提供给向量单元之前重新排列源寄存器中的向量操作数,从而避免了对置换指令的需要。

    Area Optimized Full Vector Width Vector Cross Product
    38.
    发明申请
    Area Optimized Full Vector Width Vector Cross Product 审中-公开
    区域优化全矢量宽度矢量交叉乘积

    公开(公告)号:US20080079713A1

    公开(公告)日:2008-04-03

    申请号:US11536156

    申请日:2006-09-28

    CPC classification number: G06T15/06 G06T2200/28

    Abstract: The present invention is generally related to the field of image processing, and more specifically to vector units for supporting image processing. A dual vector unit implementation is described wherein two vector units are configured receive data from a common register file. The vector units may independently and simultaneously process instructions. Furthermore, the vector units may be adapted to perform scalar operations thereby integrating the vector and scalar processing. The vector units may also be configured to share resources to perform an operation, for example, a cross product operation.

    Abstract translation: 本发明通常涉及图像处理领域,更具体地涉及用于支持图像处理的矢量单元。 描述了双向量单元实现,其中配置了两个向量单元从公共寄存器文件接收数据。 向量单元可以独立地并且同时处理指令。 此外,矢量单元可以适于执行标量运算,从而整合向量和标量处理。 矢量单元还可以被配置为共享资源以执行操作,例如交叉产品操作。

    Method and apparatus for implementing a multiple operand vector floating point summation to scalar function
    40.
    发明授权
    Method and apparatus for implementing a multiple operand vector floating point summation to scalar function 失效
    用于实现多重操作数向量浮点求和的标量函数的方法和装置

    公开(公告)号:US08239438B2

    公开(公告)日:2012-08-07

    申请号:US11840277

    申请日:2007-08-17

    Abstract: Embodiments of the invention provide methods and apparatus for executing a multiple operand instruction. Executing the multiple operand instruction comprises computing an arithmetic result of a pair of operands in each processing lane of a vector unit. The arithmetic results generated in each processing lane of the vector unit may be transferred to a dot product unit. The dot product unit may compute an arithmetic result using the arithmetic result computed by each processing lane of the vector unit to generate an arithmetic result of more than two operands.

    Abstract translation: 本发明的实施例提供了用于执行多操作数指令的方法和装置。 执行多操作数指令包括​​计算向量单元的每个处理通道中的一对操作数的算术结果。 在矢量单元的每个处理车道中产生的算术结果可以被转移到点积单位。 点积单位可以使用由向量单位的每个处理车道计算的算术结果来计算算术结果,以生成超过两个操作数的算术结果。

Patent Agency Ranking