ANISOTROPIC TEXTURE FILTERING WITH TEXTURE DATA PREFETCHING
    11.
    发明申请
    ANISOTROPIC TEXTURE FILTERING WITH TEXTURE DATA PREFETCHING 有权
    具有纹理数据预选的各向异性纹理滤波

    公开(公告)号:US20120169755A1

    公开(公告)日:2012-07-05

    申请号:US13421169

    申请日:2012-03-15

    CPC classification number: G06T15/04 G06T2200/12

    Abstract: A circuit arrangement and method utilize texture data prefetching to prefetch texture data used by an anisotropic filtering algorithm. In particular, stride-based prefetching may be used to prefetch texture data for use in anisotropic filtering, where the value of the stride, or difference between successive accesses, is based upon a distance in a memory address space between sample points taken along the line of anisotropy used in an anisotropic filtering algorithm.

    Abstract translation: 电路布置和方法利用纹理数据预取来预取由各向异性滤波算法使用的纹理数据。 特别地,可以使用基于步幅的预取来预取用于各向异性过滤中的纹理数据,其中步幅的值或连续访问之间的差是基于沿着线所取的采样点之间的存储器地址空间中的距离 在各向异性过滤算法中使用各向异性。

    Method and apparatus for generating trigonometric results
    12.
    发明授权
    Method and apparatus for generating trigonometric results 失效
    生成三角结果的方法和装置

    公开(公告)号:US08090756B2

    公开(公告)日:2012-01-03

    申请号:US11668040

    申请日:2007-01-29

    CPC classification number: G06F7/548

    Abstract: A method, computer-readable medium, and apparatus for generating a trigonometric value. The method includes receiving a request to calculate a trigonometric value for an angle value and calculating a fractional value from the angle value. The fractional value corresponds to one of a first quadrant value, a second quadrant value, a third quadrant value, and a fourth quadrant value. The method also includes using the fractional value to determine whether to perform at least one of inverting the fractional value and negating the trigonometric value. The method further includes generating the trigonometric value from the fractional value by adding at least a portion of the fractional value with at least one of a shifted fractional value produced by shifting the portion of the fractional value and a constant value and providing the trigonometric value in response to the request.

    Abstract translation: 一种用于产生三角值的方法,计算机可读介质和装置。 该方法包括接收计算角度值的三角值并从角度值计算分数值的请求。 分数值对应于第一象限值,第二象限值,第三象限值和第四象限值中的一个。 该方法还包括使用分数值来确定是否执行反转小数值和否定三角值中的至少一个。 该方法还包括通过将分数值的至少一部分与通过移动分数值的一部分和恒定值而产生的移位分数值中的至少一个相加,从而产生三角值,并将三角值提供给 响应请求。

    Early exit processing of iterative refinement algorithm using register dependency disable and programmable early exit condition
    13.
    发明授权
    Early exit processing of iterative refinement algorithm using register dependency disable and programmable early exit condition 失效
    提前退出处理迭代细化算法使用寄存器依赖关系禁用和可编程提前退出条件

    公开(公告)号:US07913066B2

    公开(公告)日:2011-03-22

    申请号:US12045243

    申请日:2008-03-10

    Abstract: A programmable “early exit” of an iterative refinement algorithm is implemented by effectively disabling read after write dependency stalls of newer instructions, as well as disabling the register write enable of these instructions, for the remainder of the algorithm, in addition to disabling the register write enable of these instructions. In addition, programmable logic is provided to enable a custom early exit condition to be specified for the iterative refinement algorithm so that the underlying hardware can be configured for optimal execution of particular iterative refinement algorithms. By doing so, the latency of the algorithm is reduced and the performance is increased without the complexity and potential poor performance of compare and branch instructions that might otherwise be required.

    Abstract translation: 迭代细化算法的可编程的“提前退出”是通过有效禁用更新指令的写依赖性停止之后的读取,以及禁用这些指令的其余部分的这些指令的寄存器写使能,除了禁止寄存器 写入使能这些指令。 此外,提供可编程逻辑以使得能够为迭代细化算法指定定制的早期退出条件,使得可以配置底层硬件以优化特定迭代细化算法的执行。 通过这样做,降低了算法的等待时间,并且性能得到提高,而没有另外需要的比较和分支指令的复杂性和潜在的差的性能。

    Instruction Target History Based Register Address Indexing
    14.
    发明申请
    Instruction Target History Based Register Address Indexing 失效
    指令目标历史记录的寄存器地址索引

    公开(公告)号:US20100125719A1

    公开(公告)日:2010-05-20

    申请号:US12274560

    申请日:2008-11-20

    CPC classification number: G06F9/30098 G06F9/3016 G06F9/3832

    Abstract: A circuit arrangement and method support instruction target history based register address indexing, whereby register addresses to be used by an instruction are decoded using a target history table of previous target register addresses, and an index into the target history table supplied by an index value in the instruction. An instruction may include at least one index value that identifies a previously used register address. During execution of the instruction, the index is retrieved from the instruction, and then a register address is retrieved from the target history table using the index.

    Abstract translation: 一种电路布置和方法支持指令目标历史的寄存器地址索引,由此由指令使用的寄存器地址使用先前目标寄存器地址的目标历史表和由目标历史表中的索引值提供的索引进行解码 指示。 指令可以包括标识先前使用的寄存器地址的至少一个索引值。 在执行指令期间,从指令中检索索引,然后使用索引从目标历史表中检索一个寄存器地址。

    Multi-Execution Unit Processing Unit with Instruction Blocking Sequencer Logic
    15.
    发明申请
    Multi-Execution Unit Processing Unit with Instruction Blocking Sequencer Logic 失效
    具有指令阻塞定序器逻辑的多执行单元处理单元

    公开(公告)号:US20100100712A1

    公开(公告)日:2010-04-22

    申请号:US12252541

    申请日:2008-10-16

    CPC classification number: G06F9/3885 G06F9/22 G06F9/3009 G06F9/3851 G06F9/3867

    Abstract: A processing unit includes multiple execution units and sequencer logic that is disposed downstream of instruction buffer logic, and that is responsive to a sequencer instruction present in an instruction stream. In response to such an instruction, the sequencer logic issues a plurality of instructions associated with a long latency operation to one execution unit, while blocking instructions from the instruction buffer logic from being issued to that execution unit. In addition, the blocking of instructions from being issued to the execution unit does not affect the issuance of instructions to any other execution unit, and as such, other instructions from the instruction buffer logic are still capable of being issued to and executed by other execution units even while the sequencer logic is issuing the plurality of instructions associated with the long latency operation.

    Abstract translation: 处理单元包括多个执行单元和定序器逻辑,其布置在指令缓冲器逻辑的下游,并且响应于指令流中存在的定序器指令。 响应于这样的指令,定序器逻辑向一个执行单元发出与长等待时间操作相关联的多个指令,同时阻止来自指令缓冲器逻辑的指令被发送到该执行单元。 此外,指令的阻塞被发布到执行单元不影响向任何其他执行单元发出指令,因此来自指令缓冲器逻辑的其他指令仍然能够被发出并由其他执行执行 即使当定序器逻辑发出与长延迟操作相关联的多个指令时。

    Execution Unit With Inline Pseudorandom Number Generator
    16.
    发明申请
    Execution Unit With Inline Pseudorandom Number Generator 失效
    带有线性伪随机数发生器的执行单元

    公开(公告)号:US20090300335A1

    公开(公告)日:2009-12-03

    申请号:US12132115

    申请日:2008-06-03

    CPC classification number: G06F9/3851 G06F9/30014 G06F9/30181

    Abstract: A circuit arrangement and method couple a hardware-based pseudorandom number generator (PRNG) to an execution unit in such a manner that pseudorandom numbers generated by the PRNG may be selectively output to the execution unit for use as an operand during the execution of instructions by the execution unit. A PRNG may be coupled to an input of an operand multiplexer that outputs to an operand input of an execution unit so that operands provided by instructions supplied to the execution unit are selectively overridden with pseudorandom numbers generated by the PRNG. Furthermore, overridden operands provided by instructions supplied to the execution unit may be used as seed values for the PRNG. In many instances, an instruction executed by an execution unit may be able to perform an arithmetic operation using both an operand specified by the instruction and a pseudorandom number generated by the PRNG during the execution of the instruction, so that the generation of the pseudorandom number and the performance of the arithmetic operation occur during a single pass of an execution unit.

    Abstract translation: 电路布置和方法将基于硬件的伪随机数生成器(PRNG)耦合到执行单元,使得由PRNG生成的伪随机数可以被选择性地输出到执行单元,以在执行指令期间用作操作数, 执行单元。 PRNG可以耦合到操作数多路复用器的输入,该输入输出到执行单元的操作数输入,使得由提供给执行单元的指令提供的操作数被PRNG生成的伪随机数选择性地重写。 此外,提供给执行单元的指令提供的覆盖操作数可以用作PRNG的种子值。 在许多情况下,执行单元执行的指令可以在执行指令期间使用由指令指定的操作数和由PRNG生成的伪随机数来执行算术运算,从而生成伪随机数 并且算术运算的执行在执行单元的单次通过期间发生。

    Structural Power Reduction in Multithreaded Processor
    17.
    发明申请
    Structural Power Reduction in Multithreaded Processor 失效
    多线程处理器中的结构功耗降低

    公开(公告)号:US20090293061A1

    公开(公告)日:2009-11-26

    申请号:US12125278

    申请日:2008-05-22

    CPC classification number: G06F9/5044 G06F9/3851 G06F9/5094 Y02D10/22

    Abstract: A circuit arrangement and method utilize a plurality of execution units having different power and performance characteristics and capabilities within a multithreaded processor core, and selectively route instructions having different performance requirements to different execution units based upon those performance requirements. As such, instructions that have high performance requirements, such as instructions associated with primary tasks or time sensitive tasks, can be routed to a higher performance execution unit to maximize performance when executing those instructions, while instructions that have low performance requirements, such as instructions associated with background tasks or non-time sensitive tasks, can be routed to a reduced power execution unit to reduce the power consumption (and associated heat generation) associated with executing those instructions.

    Abstract translation: 电路布置和方法利用在多线程处理器核心内具有不同功率和性能特征和能力的多个执行单元,并且基于那些性能要求,有选择地将具有不同性能要求的指令路由到不同的执行单元。 因此,具有高性能要求的指令(例如与主要任务或时间敏感任务相关联的指令)可以被路由到更高性能的执行单元,以在执行那些指令时最大化性能,而具有低性能要求的指令,例如指令 与后台任务或非时间敏感任务相关联,可以被路由到减少的功率执行单元以减少与执行这些指令相关联的功耗(和相关联的发热)。

    Dynamic Merging of Pipeline Stages in an Execution Pipeline to Reduce Power Consumption
    18.
    发明申请
    Dynamic Merging of Pipeline Stages in an Execution Pipeline to Reduce Power Consumption 有权
    管道阶段在执行管道中动态合并以降低功耗

    公开(公告)号:US20090292907A1

    公开(公告)日:2009-11-26

    申请号:US12125135

    申请日:2008-05-22

    Abstract: A pipelined execution unit incorporates one or more low power modes that reduce power consumption by dynamically merging pipeline stages in an execution pipeline together with one another. In particular, the execution logic in successive pipeline stages in an execution pipeline may be dynamically merged together by setting one or more latches that are intermediate to such pipeline stages to a transparent state such that the output of the pipeline stage preceding such latches is passed to the subsequent pipeline stage during the same clock cycle so that both such pipeline stages effectively perform steps for the same instruction during each clock cycle. Then, with the selected pipeline stages merged, the power consumption of the execution pipeline can be reduced (e.g., by reducing the clock frequency and/or operating voltage of the execution pipeline), often with minimal adverse impact on performance.

    Abstract translation: 流水线执行单元包括一个或多个低功率模式,其通过在执行流水线中彼此动态合并流水线级来降低功耗。 特别地,执行流水线中的连续流水线阶段中的执行逻辑可以通过将一个或多个这样的流水线级中间的锁存器设置为透明状态来动态地合并在一起,使得在这种锁存器之前的流水线级的输出被传递到 在相同时钟周期期间的后续流水线级,使得这两个流水线级在每个时钟周期期间有效地执行相同指令的步骤。 然后,在所选择的流水线级合并的情况下,可以减少执行流水线的功耗(例如,通过降低执行流水线的时钟频率和/或操作电压),通常对性能的不利影响最小。

    Execution Unit with Data Dependent Conditional Write Instructions
    19.
    发明申请
    Execution Unit with Data Dependent Conditional Write Instructions 有权
    具有数据相关条件写入指令的执行单元

    公开(公告)号:US20090240920A1

    公开(公告)日:2009-09-24

    申请号:US12050721

    申请日:2008-03-18

    CPC classification number: G06F9/30072 G06F9/30043 G06F9/3851 G06F9/3885

    Abstract: An execution unit supports data dependent conditional write instructions that write data to a target only when a particular condition is met. In one implementation, a data dependent conditional write instruction identifies a condition as well as data to be tested against that condition. The data is tested against that condition, and the result of the test is used to selectively enable or disable a write to a target associated with the data dependent conditional write instruction. Then, a write is attempted while the write to the target is enabled or disabled such that the write will update the contents of the target only when the write is selectively enabled as a result of the test. By doing so, dependencies are typically avoided, as is use of an architected condition register that might otherwise introduce branch prediction mispredict penalties, enabling improved performance with z-buffer test and similar types of algorithms.

    Abstract translation: 执行单元支持仅当满足特定条件时将数据写入目标的数据相关条件写指令。 在一个实现中,依赖于数据的条件写入指令识别条件以及针对该条件进行测试的数据。 根据该条件测试数据,并且测试结果用于选择性地启用或禁用对与数据相关条件写指令相关联的目标的写入。 然后,当对目标的写入被启用或禁用时,尝试写入,以便只有当作为测试的结果有选择地启用写入时,写入才会更新目标的内容。 通过这样做,通常可以避免依赖关系,因为使用可能会导致分支预测错误处理的架构条件寄存器,可以通过z缓冲区测试和类似类型的算法实现改进的性能。

    Processing Unit Incorporating Vectorizable Execution Unit
    20.
    发明申请
    Processing Unit Incorporating Vectorizable Execution Unit 有权
    加工单元结合可矢量化执行单元

    公开(公告)号:US20090150647A1

    公开(公告)日:2009-06-11

    申请号:US11952193

    申请日:2007-12-07

    Abstract: A vectorizable execution unit is capable of being operated in a plurality of modes, with the processing lanes in the vectorizable execution unit grouped into different combinations of logical execution units in different modes. By doing so, processing lanes can be selectively grouped together to operate as different types of vector execution units and/or scalar execution units, and if desired, dynamically switched during runtime to process various types of instruction streams in a manner that is best suited for each type of instruction stream. As a consequence, a single vectorizable execution unit may be configurable, e.g., via software control, to operate either as a vector execution or a plurality of scalar execution units.

    Abstract translation: 可矢量化执行单元能够以多种模式操作,可矢量化执行单元中的处理通道被分组成不同模式的逻辑执行单元的不同组合。 通过这样做,处理通道可以选择性地组合在一起以作为不同类型的向量执行单元和/或标量执行单元来操作,并且如果需要,在运行时期间以最适合于以下方式处理各种类型的指令流的方式进行动态切换 每种类型的指令流。 因此,单个可矢量化执行单元可以例如通过软件控制来配置,以作为向量执行或多个标量执行单元来操作。

Patent Agency Ranking