Dual Independent and Shared Resource Vector Execution Units With Shared Register File
    21.
    发明申请
    Dual Independent and Shared Resource Vector Execution Units With Shared Register File 审中-公开
    具有共享寄存器文件的双独立和共享资源向量执行单元

    公开(公告)号:US20080079712A1

    公开(公告)日:2008-04-03

    申请号:US11536146

    申请日:2006-09-28

    CPC classification number: G06T1/20 G06F15/8092 G06T15/005

    Abstract: The present invention is generally related to the field of image processing, and more specifically to vector units for supporting image processing. A dual vector unit implementation is described wherein two vector units are configured receive data from a common register file. The vector units may independently and simultaneously process instructions. Furthermore, the vector units may be adapted to perform scalar operations thereby integrating the vector and scalar processing. The vector units may also be configured to share resources to perform an operation, for example, a cross product operation.

    Abstract translation: 本发明通常涉及图像处理领域,更具体地涉及用于支持图像处理的矢量单元。 描述了双向量单元实现,其中配置了两个向量单元从公共寄存器文件接收数据。 向量单元可以独立地并且同时处理指令。 此外,矢量单元可以适于执行标量运算,从而整合向量和标量处理。 矢量单元还可以被配置为共享资源以执行操作,例如交叉产品操作。

    Method and apparatus for implementing power of two floating point estimation
    22.
    发明授权
    Method and apparatus for implementing power of two floating point estimation 失效
    实现两个浮点估计功率的方法和装置

    公开(公告)号:US07143126B2

    公开(公告)日:2006-11-28

    申请号:US10607359

    申请日:2003-06-26

    CPC classification number: G06F7/556 G06F7/483

    Abstract: A method and apparatus are provided for implementing a power of two estimation function in a general purpose floating-point processor. A floating point number is stored within a memory. The floating point number includes a sign bit, a plurality of exponent bits, and a mantissa having an implied bit and a plurality of fraction bits. In response to a floating-point instruction, the mantissa is partitioned into an integer part and a fraction part, based on the exponent bits. A floating-point result is provided by assigning the integer part of the floating point number as an unbiased exponent of the floating-point result, and by utilizing combinational logic hardware for converting the fraction part of the floating point number to a fraction part of the floating point result.

    Abstract translation: 提供了一种用于在通用浮点处理器中实现两种估计功能的功率的方法和装置。 浮点数存储在内存中。 浮点数包括符号位,多个指数位和具有隐含位和多个分数位的尾数。 响应于浮点指令,基于指数位将尾数划分为整数部分和分数部分。 通过将浮点数的整数部分分配为浮点结果的无偏指数,并且通过使用组合逻辑硬件将浮点数的分数部分转换为浮点数的分数部分来提供浮点结果 浮点结果。

    EXECUTION UNIT WITH INLINE PSEUDORANDOM NUMBER GENERATOR
    23.
    发明申请
    EXECUTION UNIT WITH INLINE PSEUDORANDOM NUMBER GENERATOR 审中-公开
    具有内置PSEUDORANDOM数字发生器的执行单元

    公开(公告)号:US20120303691A1

    公开(公告)日:2012-11-29

    申请号:US13556464

    申请日:2012-07-24

    CPC classification number: G06F9/3851 G06F9/30014 G06F9/30181

    Abstract: A circuit arrangement and method couple a hardware-based pseudorandom number generator (PRNG) to an execution unit in such a manner that pseudorandom numbers generated by the PRNG may be selectively output to the execution unit for use as an operand during the execution of instructions by the execution unit. A PRNG may be coupled to an input of an operand multiplexer that outputs to an operand input of an execution unit so that operands provided by instructions supplied to the execution unit are selectively overridden with pseudorandom numbers generated by the PRNG. Furthermore, overridden operands provided by instructions supplied to the execution unit may be used as seed values for the PRNG.

    Abstract translation: 电路布置和方法将基于硬件的伪随机数生成器(PRNG)耦合到执行单元,使得由PRNG生成的伪随机数可以被选择性地输出到执行单元,以在执行指令期间用作操作数, 执行单元。 PRNG可以耦合到操作数多路复用器的输入,该输入输出到执行单元的操作数输入,使得由提供给执行单元的指令提供的操作数被PRNG生成的伪随机数选择性地覆盖。 此外,提供给执行单元的指令提供的覆盖操作数可以用作PRNG的种子值。

    Dynamic merging of pipeline stages in an execution pipeline to reduce power consumption
    25.
    发明授权
    Dynamic merging of pipeline stages in an execution pipeline to reduce power consumption 有权
    在执行管道中动态合并流水线阶段以降低功耗

    公开(公告)号:US08291201B2

    公开(公告)日:2012-10-16

    申请号:US12125135

    申请日:2008-05-22

    Abstract: A pipelined execution unit incorporates one or more low power modes that reduce power consumption by dynamically merging pipeline stages in an execution pipeline together with one another. In particular, the execution logic in successive pipeline stages in an execution pipeline may be dynamically merged together by setting one or more latches that are intermediate to such pipeline stages to a transparent state such that the output of the pipeline stage preceding such latches is passed to the subsequent pipeline stage during the same clock cycle so that both such pipeline stages effectively perform steps for the same instruction during each clock cycle. Then, with the selected pipeline stages merged, the power consumption of the execution pipeline can be reduced (e.g., by reducing the clock frequency and/or operating voltage of the execution pipeline), often with minimal adverse impact on performance.

    Abstract translation: 流水线执行单元包括一个或多个低功率模式,其通过在执行流水线中彼此动态合并流水线阶段来降低功耗。 特别地,执行流水线中的连续流水线阶段中的执行逻辑可以通过将一个或多个这样的流水线级中间的锁存器设置为透明状态来动态地合并在一起,使得在这种锁存器之前的流水线级的输出被传递到 在相同时钟周期期间的后续流水线级,使得这两个流水线级在每个时钟周期期间有效地执行相同指令的步骤。 然后,在所选择的流水线级合并的情况下,可以减少执行流水线的功耗(例如,通过降低执行流水线的时钟频率和/或操作电压),通常对性能的不利影响最小。

    Area efficient transcendental estimate algorithm
    26.
    发明授权
    Area efficient transcendental estimate algorithm 失效
    区域有效超验估计算法

    公开(公告)号:US08275821B2

    公开(公告)日:2012-09-25

    申请号:US11851658

    申请日:2007-09-07

    CPC classification number: G06F7/548

    Abstract: A method, computer-readable medium, and an apparatus for generating a transcendental value. The method includes receiving an input containing an input value and an opcode and determining whether the opcode corresponds to a trigonometric operation or a power-of-two operation. The method also includes calculating a fractional value and an integer value from the input value, generating the transcendental value based on the fractional value by adding at least a portion of the fractional value with at least one of a shifted fractional value produced by shifting the portion of the fractional value and a constant value, and providing the transcendental value in response to the request. In this fashion, the same circuit area may be used to carry out both trigonometric and power-of-two calculations, leading to greater circuit area savings and performance advantages while not sacrificing significant accuracy.

    Abstract translation: 一种用于产生超验值的方法,计算机可读介质和装置。 该方法包括接收包含输入值和操作码的输入,并确定操作码是否对应于三角运算或二进制运算。 该方法还包括从输入值计算分数值和整数值,通过将分数值的至少一部分与通过移动部分产生的移位分数值中的至少一个相加而基于分数值生成超越值 的分数值和恒定值,并且响应于该请求提供超验值。 以这种方式,可以使用相同的电路面积来执行三角和二次幂计算,导致更大的电路面积节省和性能优点,而不牺牲显着的精度。

    Tree Insertion Depth Adjustment Based on View Frustrum and Distance Culling
    27.
    发明申请
    Tree Insertion Depth Adjustment Based on View Frustrum and Distance Culling 有权
    基于视图和距离剔除的树插入深度调整

    公开(公告)号:US20120236001A1

    公开(公告)日:2012-09-20

    申请号:US13476876

    申请日:2012-05-21

    CPC classification number: G06T15/06 G06T17/005

    Abstract: A computer-implemented method includes initializing a driver associated with an input/output adapter in response to receiving an initialize driver request from a client application. The computer-implemented method includes initializing the input/output adapter to enable adapter capabilities of the input/output adapter to be determined. The computer-implemented method also includes determining the adapter capabilities of the input/output adapter. The computer-implemented method further includes determining slot capabilities of a slot associated with the input/output adapter. The computer-implemented method also includes setting configurable capabilities of the input/output adapter based on the adapter capabilities and the slot capabilities.

    Abstract translation: 计算机实现的方法包括初始化与输入/输出适配器相关联的驱动程序以响应于从客户端应用程序接收到初始化驱动程序请求。 计算机实现的方法包括初始化输入/输出适配器以确定输入/输出适配器的适配器能力。 计算机实现的方法还包括确定输入/输出适配器的适配器能力。 计算机实现的方法还包括确定与输入/输出适配器相关联的时隙的时隙能力。 计算机实现的方法还包括基于适配器能力和时隙能力来设置输入/输出适配器的可配置功能。

    Structural power reduction in multithreaded processor
    28.
    发明授权
    Structural power reduction in multithreaded processor 失效
    多线程处理器中的结构功耗降低

    公开(公告)号:US08140830B2

    公开(公告)日:2012-03-20

    申请号:US12125278

    申请日:2008-05-22

    CPC classification number: G06F9/5044 G06F9/3851 G06F9/5094 Y02D10/22

    Abstract: A circuit arrangement and method utilize a plurality of execution units having different power and performance characteristics and capabilities within a multithreaded processor core, and selectively route instructions having different performance requirements to different execution units based upon those performance requirements. As such, instructions that have high performance requirements, such as instructions associated with primary tasks or time sensitive tasks, can be routed to a higher performance execution unit to maximize performance when executing those instructions, while instructions that have low performance requirements, such as instructions associated with background tasks or non-time sensitive tasks, can be routed to a reduced power execution unit to reduce the power consumption (and associated heat generation) associated with executing those instructions.

    Abstract translation: 电路布置和方法利用在多线程处理器核心内具有不同功率和性能特征和能力的多个执行单元,并且基于那些性能要求,有选择地将具有不同性能要求的指令路由到不同的执行单元。 因此,具有高性能要求的指令(例如与主要任务或时间敏感任务相关联的指令)可以被路由到更高性能的执行单元,以在执行那些指令时最大化性能,而具有低性能要求的指令,例如指令 与后台任务或非时间敏感任务相关联,可以被路由到减少的功率执行单元以减少与执行这些指令相关联的功耗(和相关联的发热)。

    Simultaneous multi-thread instructions issue to execution units while substitute injecting sequence of instructions for long latency sequencer instruction via multiplexer
    29.
    发明授权
    Simultaneous multi-thread instructions issue to execution units while substitute injecting sequence of instructions for long latency sequencer instruction via multiplexer 失效
    同时多线程指令发送到执行单元,同时通过多路复用器代替长延迟定序器指令的指令序列

    公开(公告)号:US07941644B2

    公开(公告)日:2011-05-10

    申请号:US12252541

    申请日:2008-10-16

    CPC classification number: G06F9/3885 G06F9/22 G06F9/3009 G06F9/3851 G06F9/3867

    Abstract: A processing unit includes multiple execution units and sequencer logic that is disposed downstream of instruction buffer logic, and that is responsive to a sequencer instruction present in an instruction stream. In response to such an instruction, the sequencer logic issues a plurality of instructions associated with a long latency operation to one execution unit, while blocking instructions from the instruction buffer logic from being issued to that execution unit. In addition, the blocking of instructions from being issued to the execution unit does not affect the issuance of instructions to any other execution unit, and as such, other instructions from the instruction buffer logic are still capable of being issued to and executed by other execution units even while the sequencer logic is issuing the plurality of instructions associated with the long latency operation.

    Abstract translation: 处理单元包括多个执行单元和定序器逻辑,其布置在指令缓冲器逻辑的下游,并且响应于指令流中存在的定序器指令。 响应于这样的指令,定序器逻辑向一个执行单元发出与长等待时间操作相关联的多个指令,同时阻止来自指令缓冲器逻辑的指令被发布到该执行单元。 此外,指令的阻塞被发布到执行单元不影响向任何其他执行单元发出指令,因此来自指令缓冲器逻辑的其他指令仍然能够被发出并由其他执行执行 即使当定序器逻辑发出与长延迟操作相关联的多个指令时。

    Processing unit incorporating special purpose register for use with instruction-based persistent vector multiplexer control
    30.
    发明授权
    Processing unit incorporating special purpose register for use with instruction-based persistent vector multiplexer control 失效
    包含专用寄存器的处理单元,用于基于指令的持久矢量多路复用器控制

    公开(公告)号:US07904700B2

    公开(公告)日:2011-03-08

    申请号:US12045222

    申请日:2008-03-10

    CPC classification number: G06F9/30032 G06F9/30036 G06F9/30109 G06F9/30123

    Abstract: A software-accessible special purpose register is architected into a processing unit in order to implement persistent vector multiplexer control of a vector-based execution unit. A persistent swizzle instruction is defined in an instruction set for the vector-based execution unit and is used to cause state information to be stored in the special purpose register such that the operand vectors processed by subsequent vector instructions executed by the vector-based execution unit will be selectively shuffled using the persisted state information. As a result, when multiple vector instructions require a common custom word ordering for one or more operand vectors, a single persistent swizzle instruction may be used to select the desired custom word ordering for all of the vector instructions.

    Abstract translation: 软件可访问专用寄存器被设计成处理单元,以便实现基于向量的执行单元的持久矢量多路复用器控制。 在基于向量的执行单元的指令集中定义持续转换指令,并且用于使状态信息存储在专用寄存器中,使得由基于向​​量的执行单元执行的后续向量指令处理的操作数向量 将使用持久状态信息选择性地进行混洗。 因此,当多个向量指令需要一个或多个操作数向量的公共自定义单词排序时,可以使用单个持续旋转指令来选择所有向量指令的期望的定制单词排序。

Patent Agency Ranking