ANISOTROPIC TEXTURE FILTERING WITH TEXTURE DATA PREFETCHING
    41.
    发明申请
    ANISOTROPIC TEXTURE FILTERING WITH TEXTURE DATA PREFETCHING 有权
    具有纹理数据预选的各向异性纹理滤波

    公开(公告)号:US20120169755A1

    公开(公告)日:2012-07-05

    申请号:US13421169

    申请日:2012-03-15

    CPC classification number: G06T15/04 G06T2200/12

    Abstract: A circuit arrangement and method utilize texture data prefetching to prefetch texture data used by an anisotropic filtering algorithm. In particular, stride-based prefetching may be used to prefetch texture data for use in anisotropic filtering, where the value of the stride, or difference between successive accesses, is based upon a distance in a memory address space between sample points taken along the line of anisotropy used in an anisotropic filtering algorithm.

    Abstract translation: 电路布置和方法利用纹理数据预取来预取由各向异性滤波算法使用的纹理数据。 特别地,可以使用基于步幅的预取来预取用于各向异性过滤中的纹理数据,其中步幅的值或连续访问之间的差是基于沿着线所取的采样点之间的存储器地址空间中的距离 在各向异性过滤算法中使用各向异性。

    Early exit processing of iterative refinement algorithm using register dependency disable and programmable early exit condition
    42.
    发明授权
    Early exit processing of iterative refinement algorithm using register dependency disable and programmable early exit condition 失效
    提前退出处理迭代细化算法使用寄存器依赖关系禁用和可编程提前退出条件

    公开(公告)号:US07913066B2

    公开(公告)日:2011-03-22

    申请号:US12045243

    申请日:2008-03-10

    Abstract: A programmable “early exit” of an iterative refinement algorithm is implemented by effectively disabling read after write dependency stalls of newer instructions, as well as disabling the register write enable of these instructions, for the remainder of the algorithm, in addition to disabling the register write enable of these instructions. In addition, programmable logic is provided to enable a custom early exit condition to be specified for the iterative refinement algorithm so that the underlying hardware can be configured for optimal execution of particular iterative refinement algorithms. By doing so, the latency of the algorithm is reduced and the performance is increased without the complexity and potential poor performance of compare and branch instructions that might otherwise be required.

    Abstract translation: 迭代细化算法的可编程的“提前退出”是通过有效禁用更新指令的写依赖性停止之后的读取,以及禁用这些指令的其余部分的这些指令的寄存器写使能,除了禁止寄存器 写入使能这些指令。 此外,提供可编程逻辑以使得能够为迭代细化算法指定定制的早期退出条件,使得可以配置底层硬件以优化特定迭代细化算法的执行。 通过这样做,降低了算法的等待时间,并且性能得到提高,而没有另外需要的比较和分支指令的复杂性和潜在的差的性能。

    Load misaligned vector with permute and mask insert
    43.
    发明授权
    Load misaligned vector with permute and mask insert 失效
    加载不对齐矢量与置换和掩码插入

    公开(公告)号:US07783860B2

    公开(公告)日:2010-08-24

    申请号:US11830920

    申请日:2007-07-31

    Abstract: Embodiments of the invention provide logic within the store data path between a processor and a memory array. The logic may be configured to misalign vector data as it is stored to memory. By misaligning vector data as it is stored to memory, memory bandwidth may be maximized while processing bandwidth required to store vector data misaligned is minimized. Furthermore, embodiments of the invention provide logic within the load data path which allows vector data which is stored misaligned to be aligned as it is loaded into a vector register. By aligning misaligned vector data as it is loaded into a vector register, memory bandwidth may be maximized while processing bandwidth required to align misaligned vector data may be minimized.

    Abstract translation: 本发明的实施例提供处理器和存储器阵列之间的存储数据路径内的逻辑。 逻辑可以被配置为在向量数据存储到存储器时将其对准。 通过在将矢量数据存储到存储器时将其对准,存储器带宽可以最大化,而存储向量数据不对齐所需的处理带宽最小化。 此外,本发明的实施例提供了负载数据路径内的逻辑,其允许存储的未对准的矢量数据在被加载到向量寄存器中时被对准。 通过在将其加载到向量寄存器中时对准未对齐的矢量数据,可以最大化存储器带宽,同时可以最小化对准未对齐矢量数据所需的处理带宽。

    Instruction Target History Based Register Address Indexing
    44.
    发明申请
    Instruction Target History Based Register Address Indexing 失效
    指令目标历史记录的寄存器地址索引

    公开(公告)号:US20100125719A1

    公开(公告)日:2010-05-20

    申请号:US12274560

    申请日:2008-11-20

    CPC classification number: G06F9/30098 G06F9/3016 G06F9/3832

    Abstract: A circuit arrangement and method support instruction target history based register address indexing, whereby register addresses to be used by an instruction are decoded using a target history table of previous target register addresses, and an index into the target history table supplied by an index value in the instruction. An instruction may include at least one index value that identifies a previously used register address. During execution of the instruction, the index is retrieved from the instruction, and then a register address is retrieved from the target history table using the index.

    Abstract translation: 一种电路布置和方法支持指令目标历史的寄存器地址索引,由此由指令使用的寄存器地址使用先前目标寄存器地址的目标历史表和由目标历史表中的索引值提供的索引进行解码 指示。 指令可以包括标识先前使用的寄存器地址的至少一个索引值。 在执行指令期间,从指令中检索索引,然后使用索引从目标历史表中检索一个寄存器地址。

    Multi-Execution Unit Processing Unit with Instruction Blocking Sequencer Logic
    45.
    发明申请
    Multi-Execution Unit Processing Unit with Instruction Blocking Sequencer Logic 失效
    具有指令阻塞定序器逻辑的多执行单元处理单元

    公开(公告)号:US20100100712A1

    公开(公告)日:2010-04-22

    申请号:US12252541

    申请日:2008-10-16

    CPC classification number: G06F9/3885 G06F9/22 G06F9/3009 G06F9/3851 G06F9/3867

    Abstract: A processing unit includes multiple execution units and sequencer logic that is disposed downstream of instruction buffer logic, and that is responsive to a sequencer instruction present in an instruction stream. In response to such an instruction, the sequencer logic issues a plurality of instructions associated with a long latency operation to one execution unit, while blocking instructions from the instruction buffer logic from being issued to that execution unit. In addition, the blocking of instructions from being issued to the execution unit does not affect the issuance of instructions to any other execution unit, and as such, other instructions from the instruction buffer logic are still capable of being issued to and executed by other execution units even while the sequencer logic is issuing the plurality of instructions associated with the long latency operation.

    Abstract translation: 处理单元包括多个执行单元和定序器逻辑,其布置在指令缓冲器逻辑的下游,并且响应于指令流中存在的定序器指令。 响应于这样的指令,定序器逻辑向一个执行单元发出与长等待时间操作相关联的多个指令,同时阻止来自指令缓冲器逻辑的指令被发送到该执行单元。 此外,指令的阻塞被发布到执行单元不影响向任何其他执行单元发出指令,因此来自指令缓冲器逻辑的其他指令仍然能够被发出并由其他执行执行 即使当定序器逻辑发出与长延迟操作相关联的多个指令时。

    Execution Unit With Inline Pseudorandom Number Generator
    46.
    发明申请
    Execution Unit With Inline Pseudorandom Number Generator 失效
    带有线性伪随机数发生器的执行单元

    公开(公告)号:US20090300335A1

    公开(公告)日:2009-12-03

    申请号:US12132115

    申请日:2008-06-03

    CPC classification number: G06F9/3851 G06F9/30014 G06F9/30181

    Abstract: A circuit arrangement and method couple a hardware-based pseudorandom number generator (PRNG) to an execution unit in such a manner that pseudorandom numbers generated by the PRNG may be selectively output to the execution unit for use as an operand during the execution of instructions by the execution unit. A PRNG may be coupled to an input of an operand multiplexer that outputs to an operand input of an execution unit so that operands provided by instructions supplied to the execution unit are selectively overridden with pseudorandom numbers generated by the PRNG. Furthermore, overridden operands provided by instructions supplied to the execution unit may be used as seed values for the PRNG. In many instances, an instruction executed by an execution unit may be able to perform an arithmetic operation using both an operand specified by the instruction and a pseudorandom number generated by the PRNG during the execution of the instruction, so that the generation of the pseudorandom number and the performance of the arithmetic operation occur during a single pass of an execution unit.

    Abstract translation: 电路布置和方法将基于硬件的伪随机数生成器(PRNG)耦合到执行单元,使得由PRNG生成的伪随机数可以被选择性地输出到执行单元,以在执行指令期间用作操作数, 执行单元。 PRNG可以耦合到操作数多路复用器的输入,该输入输出到执行单元的操作数输入,使得由提供给执行单元的指令提供的操作数被PRNG生成的伪随机数选择性地重写。 此外,提供给执行单元的指令提供的覆盖操作数可以用作PRNG的种子值。 在许多情况下,执行单元执行的指令可以在执行指令期间使用由指令指定的操作数和由PRNG生成的伪随机数来执行算术运算,从而生成伪随机数 并且算术运算的执行在执行单元的单次通过期间发生。

    Execution Unit with Data Dependent Conditional Write Instructions
    47.
    发明申请
    Execution Unit with Data Dependent Conditional Write Instructions 有权
    具有数据相关条件写入指令的执行单元

    公开(公告)号:US20090240920A1

    公开(公告)日:2009-09-24

    申请号:US12050721

    申请日:2008-03-18

    CPC classification number: G06F9/30072 G06F9/30043 G06F9/3851 G06F9/3885

    Abstract: An execution unit supports data dependent conditional write instructions that write data to a target only when a particular condition is met. In one implementation, a data dependent conditional write instruction identifies a condition as well as data to be tested against that condition. The data is tested against that condition, and the result of the test is used to selectively enable or disable a write to a target associated with the data dependent conditional write instruction. Then, a write is attempted while the write to the target is enabled or disabled such that the write will update the contents of the target only when the write is selectively enabled as a result of the test. By doing so, dependencies are typically avoided, as is use of an architected condition register that might otherwise introduce branch prediction mispredict penalties, enabling improved performance with z-buffer test and similar types of algorithms.

    Abstract translation: 执行单元支持仅当满足特定条件时将数据写入目标的数据相关条件写指令。 在一个实现中,依赖于数据的条件写入指令识别条件以及针对该条件进行测试的数据。 根据该条件测试数据,并且测试结果用于选择性地启用或禁用对与数据相关条件写指令相关联的目标的写入。 然后,当对目标的写入被启用或禁用时,尝试写入,以便只有当作为测试的结果有选择地启用写入时,写入才会更新目标的内容。 通过这样做,通常可以避免依赖关系,因为使用可能会导致分支预测错误处理的架构条件寄存器,可以通过z缓冲区测试和类似类型的算法实现改进的性能。

    Processing Unit Incorporating Vectorizable Execution Unit
    48.
    发明申请
    Processing Unit Incorporating Vectorizable Execution Unit 有权
    加工单元结合可矢量化执行单元

    公开(公告)号:US20090150647A1

    公开(公告)日:2009-06-11

    申请号:US11952193

    申请日:2007-12-07

    Abstract: A vectorizable execution unit is capable of being operated in a plurality of modes, with the processing lanes in the vectorizable execution unit grouped into different combinations of logical execution units in different modes. By doing so, processing lanes can be selectively grouped together to operate as different types of vector execution units and/or scalar execution units, and if desired, dynamically switched during runtime to process various types of instruction streams in a manner that is best suited for each type of instruction stream. As a consequence, a single vectorizable execution unit may be configurable, e.g., via software control, to operate either as a vector execution or a plurality of scalar execution units.

    Abstract translation: 可矢量化执行单元能够以多种模式操作,可矢量化执行单元中的处理通道被分组成不同模式的逻辑执行单元的不同组合。 通过这样做,处理通道可以选择性地组合在一起以作为不同类型的向量执行单元和/或标量执行单元来操作,并且如果需要,在运行时期间以最适合于以下方式处理各种类型的指令流的方式进行动态切换 每种类型的指令流。 因此,单个可矢量化执行单元可以例如通过软件控制来配置,以作为向量执行或多个标量执行单元来操作。

    "> Scalar Precision Float Implementation on the
    49.
    发明申请
    Scalar Precision Float Implementation on the "W" Lane of Vector Unit 失效
    矢量单元“W”通道上的标量精度浮点实现

    公开(公告)号:US20090106527A1

    公开(公告)日:2009-04-23

    申请号:US11877205

    申请日:2007-10-23

    Abstract: Embodiments of the invention are generally related to image processing, and more specifically to vector units for supporting image processing. A combined vector/scalar unit is provided wherein one or more processing lanes of the vector unit are used for performing scalar operations. An integrated register file is also provided for storing vector and scalar data. Therefore, the transfer of data to memory to exchange data between independent vector and scalar units is obviated and a significant amount of chip area is saved.

    Abstract translation: 本发明的实施例通常涉及图像处理,更具体地涉及用于支持图像处理的矢量单元。 提供了组合矢量/标量单元,其中矢量单元的一个或多个处理通道用于执行标量运算。 还提供了用于存储向量和标量数据的集成寄存器文件。 因此,消除了将数据传输到存储器以在独立矢量和标量单位之间交换数据,并且节省了大量的芯片面积。

    "> Single Precision Vector Dot Product with
    50.
    发明申请
    Single Precision Vector Dot Product with "Word" Vector Write Mask 失效
    单精度矢量点产品带有“Word”向量写入掩码

    公开(公告)号:US20080114826A1

    公开(公告)日:2008-05-15

    申请号:US11554774

    申请日:2006-10-31

    CPC classification number: G06F17/16

    Abstract: The present invention is generally related to the field of image processing, and more specifically to an instruction set for processing images. Vector processing may involve performing a plurality of dot product operations to generate operands for generating operands for a new vector. The dot product operations may require the issue of a plurality of permute instructions to arrange the vector operands in desired locations of a target register. Embodiments of the invention provide a dot product instruction wherein a mask field may be used to specify a particular location of a target register in which to transfer data, thereby avoiding the need for permute instructions for arranging data, reducing dependencies between instructions, and the usage of temporary registers.

    Abstract translation: 本发明通常涉及图像处理领域,更具体地涉及用于处理图像的指令集。 矢量处理可以涉及执行多个点积运算以产生用于生成新向量的操作数的操作数。 点产品操作可能需要发出多个置换指令以将向量操作数布置在目标寄存器的期望位置中。 本发明的实施例提供一种点积指令,其中掩模字段可用于指定在其中传送数据的目标寄存器的特定位置,从而避免需要用于排列数据的置换指令,减少指令之间的依赖关系和使用 的临时寄存器。

Patent Agency Ranking