Simultaneous multi-thread instructions issue to execution units while substitute injecting sequence of instructions for long latency sequencer instruction via multiplexer
    41.
    发明授权
    Simultaneous multi-thread instructions issue to execution units while substitute injecting sequence of instructions for long latency sequencer instruction via multiplexer 失效
    同时多线程指令发送到执行单元,同时通过多路复用器代替长延迟定序器指令的指令序列

    公开(公告)号:US07941644B2

    公开(公告)日:2011-05-10

    申请号:US12252541

    申请日:2008-10-16

    CPC classification number: G06F9/3885 G06F9/22 G06F9/3009 G06F9/3851 G06F9/3867

    Abstract: A processing unit includes multiple execution units and sequencer logic that is disposed downstream of instruction buffer logic, and that is responsive to a sequencer instruction present in an instruction stream. In response to such an instruction, the sequencer logic issues a plurality of instructions associated with a long latency operation to one execution unit, while blocking instructions from the instruction buffer logic from being issued to that execution unit. In addition, the blocking of instructions from being issued to the execution unit does not affect the issuance of instructions to any other execution unit, and as such, other instructions from the instruction buffer logic are still capable of being issued to and executed by other execution units even while the sequencer logic is issuing the plurality of instructions associated with the long latency operation.

    Abstract translation: 处理单元包括多个执行单元和定序器逻辑,其布置在指令缓冲器逻辑的下游,并且响应于指令流中存在的定序器指令。 响应于这样的指令,定序器逻辑向一个执行单元发出与长等待时间操作相关联的多个指令,同时阻止来自指令缓冲器逻辑的指令被发布到该执行单元。 此外,指令的阻塞被发布到执行单元不影响向任何其他执行单元发出指令,因此来自指令缓冲器逻辑的其他指令仍然能够被发出并由其他执行执行 即使当定序器逻辑发出与长延迟操作相关联的多个指令时。

    Processing unit incorporating special purpose register for use with instruction-based persistent vector multiplexer control
    42.
    发明授权
    Processing unit incorporating special purpose register for use with instruction-based persistent vector multiplexer control 失效
    包含专用寄存器的处理单元,用于基于指令的持久矢量多路复用器控制

    公开(公告)号:US07904700B2

    公开(公告)日:2011-03-08

    申请号:US12045222

    申请日:2008-03-10

    CPC classification number: G06F9/30032 G06F9/30036 G06F9/30109 G06F9/30123

    Abstract: A software-accessible special purpose register is architected into a processing unit in order to implement persistent vector multiplexer control of a vector-based execution unit. A persistent swizzle instruction is defined in an instruction set for the vector-based execution unit and is used to cause state information to be stored in the special purpose register such that the operand vectors processed by subsequent vector instructions executed by the vector-based execution unit will be selectively shuffled using the persisted state information. As a result, when multiple vector instructions require a common custom word ordering for one or more operand vectors, a single persistent swizzle instruction may be used to select the desired custom word ordering for all of the vector instructions.

    Abstract translation: 软件可访问专用寄存器被设计成处理单元,以便实现基于向量的执行单元的持久矢量多路复用器控制。 在基于向量的执行单元的指令集中定义持续转换指令,并且用于使状态信息存储在专用寄存器中,使得由基于向​​量的执行单元执行的后续向量指令处理的操作数向量 将使用持久状态信息选择性地进行混洗。 因此,当多个向量指令需要一个或多个操作数向量的公共自定义单词排序时,可以使用单个持续旋转指令来选择所有向量指令的期望的定制单词排序。

    Anisotropic Texture Filtering with Texture Data Prefetching
    43.
    发明申请
    Anisotropic Texture Filtering with Texture Data Prefetching 有权
    各向异性纹理过滤与纹理数据预取

    公开(公告)号:US20090315908A1

    公开(公告)日:2009-12-24

    申请号:US12110045

    申请日:2008-04-25

    CPC classification number: G06T15/04 G06T2200/12

    Abstract: A circuit arrangement and method utilize texture data prefetching to prefetch texture data used by an anisotropic filtering algorithm. In particular, stride-based prefetching may be used to prefetch texture data for use in anisotropic filtering, where the value of the stride, or difference between successive accesses, is based upon a distance in a memory address space between sample points taken along the line of anisotropy used in an anisotropic filtering algorithm.

    Abstract translation: 电路布置和方法利用纹理数据预取来预取由各向异性滤波算法使用的纹理数据。 特别地,可以使用基于步幅的预取来预取用于各向异性过滤中的纹理数据,其中步幅的值或连续访问之间的差是基于沿着线所取的采样点之间的存储器地址空间中的距离 在各向异性过滤算法中使用各向异性。

    Reciprocal estimate computation methods and apparatus
    44.
    发明授权
    Reciprocal estimate computation methods and apparatus 有权
    互惠估算计算方法和装置

    公开(公告)号:US07634527B2

    公开(公告)日:2009-12-15

    申请号:US11282032

    申请日:2005-11-17

    Abstract: In a first aspect, a first method of reciprocal estimate computation using floating point pipeline logic is provided. The first method includes the steps of (1) receiving an input value having an exponent and a mantissa when represented as a floating point number on which a reciprocal estimate computation is to be performed; (2) determining whether the exponent is one of a plurality of predetermined numbers; and (3) if the exponent is one of the plurality of predetermined numbers, adjusting at least one of a plurality of modified mantissa bits (e.g., mantissa bits internal to leading zero anticipator (LZA) logic) and the exponent so as to prevent an underflow result of the reciprocal estimate computation. Numerous other aspects are provided.

    Abstract translation: 在第一方面,提供了使用浮点流水线逻辑的互易估计计算的第一种方法。 第一种方法包括以下步骤:(1)当表示为要进行相互估计计算的浮点数时,接收具有指数和尾数的输入值; (2)确定指数是否是多个预定数字之一; 和(3)如果指数是多个预定数目中的一个,则调整多个经修改的尾数位(例如,前导零预测器(LZA)逻辑内部的尾数位)和指数中的至少一个,以防止 倒数估计计算的下溢结果。 提供了许多其他方面。

    Early Exit Processing of Iterative Refinement Algorithm Using Register Dependency Disable
    45.
    发明申请
    Early Exit Processing of Iterative Refinement Algorithm Using Register Dependency Disable 失效
    使用寄存器依赖关系禁用的迭代优化算法的早期退出处理

    公开(公告)号:US20090228690A1

    公开(公告)日:2009-09-10

    申请号:US12045313

    申请日:2008-03-10

    Abstract: An “early exit” of an iterative refinement algorithm is implemented by effectively disabling read after write dependency stalls of newer instructions, as well as disabling the register write enable of these instructions, for the remainder of the algorithm, in addition to disabling the register write enable of these instructions. By doing so, the latency of the algorithm is reduced and the performance is increased without the complexity and potential poor performance of compare and branch instructions that might otherwise be required.

    Abstract translation: 迭代细化算法的“提前退出”除了禁用寄存器写入之外,还通过有效禁用更新指令的写依赖性停止之后的读取以及禁止这些指令的寄存器写使能,对于算法的其余部分 启用这些指令。 通过这样做,降低了算法的等待时间,并且性能得到提高,而没有另外需要的比较和分支指令的复杂性和潜在的差的性能。

    Early Exit Processing of Iterative Refinement Algorithm Using Register Dependency Disable and Programmable Early Exit Condition
    46.
    发明申请
    Early Exit Processing of Iterative Refinement Algorithm Using Register Dependency Disable and Programmable Early Exit Condition 失效
    使用寄存器依赖关闭和可编程提前退出条件的迭代优化算法的早期退出处理

    公开(公告)号:US20090228689A1

    公开(公告)日:2009-09-10

    申请号:US12045243

    申请日:2008-03-10

    Abstract: A programmable “early exit” of an iterative refinement algorithm is implemented by effectively disabling read after write dependency stalls of newer instructions, as well as disabling the register write enable of these instructions, for the remainder of the algorithm, in addition to disabling the register write enable of these instructions. In addition, programmable logic is provided to enable a custom early exit condition to be specified for the iterative refinement algorithm so that the underlying hardware can be configured for optimal execution of particular iterative refinement algorithms. By doing so, the latency of the algorithm is reduced and the performance is increased without the complexity and potential poor performance of compare and branch instructions that might otherwise be required.

    Abstract translation: 迭代细化算法的可编程的“提前退出”是通过有效禁用更新指令的写依赖性停止之后的读取,以及禁用这些指令的其余部分的这些指令的寄存器写使能,除了禁止寄存器 写这些指令的使能。 此外,提供可编程逻辑以使得能够为迭代细化算法指定定制的早期退出条件,使得可以配置底层硬件以优化特定迭代细化算法的执行。 通过这样做,降低了算法的等待时间,并且性能得到提高,而没有另外需要的比较和分支指令的复杂性和潜在的差的性能。

    Processing Unit Incorporating Multirate Execution Unit
    47.
    发明申请
    Processing Unit Incorporating Multirate Execution Unit 失效
    加工单元并入多速率执行单元

    公开(公告)号:US20090182987A1

    公开(公告)日:2009-07-16

    申请号:US11972746

    申请日:2008-01-11

    Abstract: A multirate execution unit is capable of being operated in a plurality of modes, with the execution unit being capable of clocked at multiple different rates relative to a multithreaded issue unit such that, in applications where maximum performance is desired, the execution unit can be clocked at a rate that is faster than the clock rate for the multithreaded issue unit, and in applications where a lower power profile is desired, the execution unit can be throttled back to a slower rate to reduce the power consumption of the execution unit. When the execution unit is clocked at a faster rate than the multithreaded issue unit, the issue unit is permitted to issue more instructions per cycle than when the execution unit is throttled to the slower rate to increase overall instruction throughput.

    Abstract translation: 多速率执行单元能够以多种模式操作,其中执行单元能够以相对于多线程发布单元的多个不同速率进行计时,使得在需要最大性能的应用中,执行单元可被计时 以比多线程发布单元的时钟速率快的速率,以及在需要较低功率配置的应用中,执行单元可以被限制回到较慢的速率以降低执行单元的功耗。 当执行单元以比多线程发布单元更快的速度进行计时时,允许发布单元每循环发出更多指令,而不是执行单元被限制到较慢的速率以增加总体指令吞吐量。

    Full Vector Width Cross Product Using Recirculation for Area Optimization
    48.
    发明申请
    Full Vector Width Cross Product Using Recirculation for Area Optimization 审中-公开
    全矢量宽度交叉产品使用再循环进行区域优化

    公开(公告)号:US20090063608A1

    公开(公告)日:2009-03-05

    申请号:US11849495

    申请日:2007-09-04

    Abstract: Embodiments of the invention are generally related to the field of image processing, and more specifically to vector units for supporting image processing. A vector unit may comprise a plurality of operand multiplexers associated with each vector processing lane of the vector unit. The operand multiplexers may select vector operands from one or more register files for performing a cross product operation. A first multiply operation may be performed in a first pipeline stage by multiplying a first set of operands in a multiplier. In a second pipeline stage, a second multiply operation may be performed by multiplying a second set of operands. The results of the first multiply operation and the second multiply operation may be transferred to an adder to complete the cross product instruction.

    Abstract translation: 本发明的实施例通常涉及图像处理领域,更具体地涉及用于支持图像处理的矢量单元。 矢量单元可以包括与矢量单元的每个矢量处理通道相关联的多个操作数复用器。 操作数复用器可以从一个或多个寄存器文件中选择矢量操作数,以执行交叉产品操作。 可以在第一流水线级中通过乘法乘法器中的第一组操作数来执行第一乘法运算。 在第二流水线级中,可以通过乘以第二组操作数来执行第二乘法运算。 可以将第一乘法运算和第二乘法运算的结果传送到加法器以完成交叉乘积指令。

    Operand Multiplexor Control Modifier Instruction in a Fine Grain Multithreaded Vector Microprocessor
    49.
    发明申请
    Operand Multiplexor Control Modifier Instruction in a Fine Grain Multithreaded Vector Microprocessor 审中-公开
    精细多线程向量微处理器中的操作数多路复用器控制修改器指令

    公开(公告)号:US20080126745A1

    公开(公告)日:2008-05-29

    申请号:US11925278

    申请日:2007-10-26

    CPC classification number: G06T1/20

    Abstract: The present invention is generally related to integrated circuit devices, and more particularly, to methods, systems and design structures for the field of image processing, and more specifically to an instruction set for processing images. Vector processing may involve rearranging vector operands in one or more source registers prior to performing vector operations. Typically, rearranging of operands in source registers is done by issuing a plurality of permute instructions that require excessive usage of temporary registers. Furthermore, the permute instructions may cause dependencies between instructions executing in a pipeline, thereby adversely affecting performance. Embodiments of the invention provide a level of muxing between a register file and a vector unit that allow for rearrangement of vector operands in source registers prior to providing the operands to the vector unit, thereby obviating the need for permute instructions.

    Abstract translation: 本发明通常涉及集成电路装置,更具体地涉及图像处理领域的方法,系统和设计结构,更具体地涉及用于处理图像的指令集。 矢量处理可以包括在执行向量操作之前在一个或多个源寄存器中重新排列向量操作数。 通常,通过发出需要临时寄存器过度使用的多个置换指令来完成源寄存器中操作数的重新排列。 此外,置换指令可能导致在流水线中执行的指令之间的相关性,从而不利地影响性能。 本发明的实施例提供了一种在寄存器文件和向量单元之间的复用水平,其允许在将操作数提供给向量单元之前重新排列源寄存器中的向量操作数,从而避免了对置换指令的需要。

    Area Optimized Full Vector Width Vector Cross Product
    50.
    发明申请
    Area Optimized Full Vector Width Vector Cross Product 审中-公开
    区域优化全矢量宽度矢量交叉乘积

    公开(公告)号:US20080079713A1

    公开(公告)日:2008-04-03

    申请号:US11536156

    申请日:2006-09-28

    CPC classification number: G06T15/06 G06T2200/28

    Abstract: The present invention is generally related to the field of image processing, and more specifically to vector units for supporting image processing. A dual vector unit implementation is described wherein two vector units are configured receive data from a common register file. The vector units may independently and simultaneously process instructions. Furthermore, the vector units may be adapted to perform scalar operations thereby integrating the vector and scalar processing. The vector units may also be configured to share resources to perform an operation, for example, a cross product operation.

    Abstract translation: 本发明通常涉及图像处理领域,更具体地涉及用于支持图像处理的矢量单元。 描述了双向量单元实现,其中配置了两个向量单元从公共寄存器文件接收数据。 向量单元可以独立地并且同时处理指令。 此外,矢量单元可以适于执行标量运算,从而整合向量和标量处理。 矢量单元还可以被配置为共享资源以执行操作,例如交叉产品操作。

Patent Agency Ranking