Implied storage operation decode using redundant target address detection
    1.
    发明授权
    Implied storage operation decode using redundant target address detection 有权
    隐藏存储操作使用冗余目标地址检测进行解码

    公开(公告)号:US08255674B2

    公开(公告)日:2012-08-28

    申请号:US12360975

    申请日:2009-01-28

    IPC分类号: G06F9/312

    摘要: A logic arrangement and method to support implied storage operation decode uses redundant target address detection, whereby target addresses of previous instructions are compared with the target address of the current instruction, and if equal, and the target addresses of previous instructions are not used as sources, the current instruction is decoded as a store instruction. This allows a redundant operation in an instruction set architecture to be redefined as a store instruction, freeing up opcodes normally used for store instructions to be used for other instructions.

    摘要翻译: 支持隐含存储操作解码的逻辑布置和方法使用冗余目标地址检测,由此将先前指令的目标地址与当前指令的目标地址进行比较,如果相等,并且先前指令的目标地址不被用作源 ,当前指令被解码为存储指令。 这允许将指令集架构中的冗余操作重新定义为存储指令,释放通常用于存储指令的操作码以用于其他指令。

    Anisotropic texture filtering with texture data prefetching
    2.
    发明授权
    Anisotropic texture filtering with texture data prefetching 有权
    具有纹理数据预取的各向异性纹理过滤

    公开(公告)号:US08217953B2

    公开(公告)日:2012-07-10

    申请号:US12110045

    申请日:2008-04-25

    IPC分类号: G09G5/00

    CPC分类号: G06T15/04 G06T2200/12

    摘要: A circuit arrangement and method utilize texture data prefetching to prefetch texture data used by an anisotropic filtering algorithm. In particular, stride-based prefetching may be used to prefetch texture data for use in anisotropic filtering, where the value of the stride, or difference between successive accesses, is based upon a distance in a memory address space between sample points taken along the line of anisotropy used in an anisotropic filtering algorithm.

    摘要翻译: 电路布置和方法利用纹理数据预取来预取由各向异性滤波算法使用的纹理数据。 特别地,可以使用基于步幅的预取来预取用于各向异性过滤中的纹理数据,其中步幅的值或连续访问之间的差是基于沿着线所取的采样点之间的存储器地址空间中的距离 在各向异性过滤算法中使用各向异性。

    Processing unit incorporating multirate execution unit
    3.
    发明授权
    Processing unit incorporating multirate execution unit 失效
    包含多速率执行单元的处理单元

    公开(公告)号:US07945764B2

    公开(公告)日:2011-05-17

    申请号:US11972746

    申请日:2008-01-11

    IPC分类号: G06F9/30

    摘要: A multirate execution unit is capable of being operated in a plurality of modes, with the execution unit being capable of clocked at multiple different rates relative to a multithreaded issue unit such that, in applications where maximum performance is desired, the execution unit can be clocked at a rate that is faster than the clock rate for the multithreaded issue unit, and in applications where a lower power profile is desired, the execution unit can be throttled back to a slower rate to reduce the power consumption of the execution unit. When the execution unit is clocked at a faster rate than the multithreaded issue unit, the issue unit is permitted to issue more instructions per cycle than when the execution unit is throttled to the slower rate to increase overall instruction throughput.

    摘要翻译: 多速率执行单元能够以多种模式操作,其中执行单元能够以相对于多线程发布单元的多个不同速率进行计时,使得在需要最大性能的应用中,执行单元可被计时 以比多线程发布单元的时钟速率快的速率,以及在需要较低功率配置的应用中,执行单元可以被限制回到较慢的速率以降低执行单元的功耗。 当执行单元以比多线程发布单元更快的速度进行计时时,允许发布单元每循环发出更多指令,而不是执行单元被限制到较慢的速率以增加总体指令吞吐量。

    Dual independent and shared resource vector execution units with shared register file
    4.
    发明授权
    Dual independent and shared resource vector execution units with shared register file 有权
    具有共享寄存器文件的双独立和共享资源向量执行单元

    公开(公告)号:US07926009B2

    公开(公告)日:2011-04-12

    申请号:US11924980

    申请日:2007-10-26

    IPC分类号: G06F17/50

    摘要: The present invention is generally related to integrated circuit devices, and more particularly, to methods, systems and design structures for the field of image processing, and more specifically to vector units for supporting image processing. A dual vector unit implementation is described wherein two vector units are configured receive data from a common register file. The vector units may independently and simultaneously process instructions. Furthermore, the vector units may be adapted to perform scalar operations thereby integrating the vector and scalar processing. The vector units may also be configured to share resources to perform an operation, for example, a cross product operation.

    摘要翻译: 本发明通常涉及集成电路装置,更具体地涉及图像处理领域的方法,系统和设计结构,更具体地涉及用于支持图像处理的矢量单元。 描述了双向量单元实现,其中配置了两个向量单元从公共寄存器文件接收数据。 向量单元可以独立地并且同时处理指令。 此外,矢量单元可以适于执行标量运算,从而整合向量和标量处理。 矢量单元还可以被配置为共享资源以执行操作,例如交叉产品操作。

    Processing Unit Incorporating Instruction-Based Persistent Vector Multiplexer Control
    5.
    发明申请
    Processing Unit Incorporating Instruction-Based Persistent Vector Multiplexer Control 失效
    结合基于指令的持续矢量多路复用器控制的处理单元

    公开(公告)号:US20090228681A1

    公开(公告)日:2009-09-10

    申请号:US12045221

    申请日:2008-03-10

    IPC分类号: G06F9/30 G06F15/76

    摘要: Persistent vector multiplexer control is used in a vector-based execution unit to control the shuffling of words in operand vectors processed by the execution unit. In addition, a persistent swizzle instruction is defined in an instruction set for the vector-based execution unit and is used to cause state information to be persisted such that the operand vectors processed by subsequent vector instructions executed by the vector-based execution unit will be selectively shuffled using the persisted state information. As a result, when multiple vector instructions require a common custom word ordering for one or more operand vectors, a single persistent swizzle instruction may be used to select the desired custom word ordering for all of the vector instructions.

    摘要翻译: 持续矢量复用器控制在基于矢量的执行单元中用于控制由执行单元处理的操作数向量中的字的混洗。 此外,在用于基于向量的执行单元的指令集中定义持续转换指令,并且用于使状态信息被持久化,使得由基于向​​量的执行单元执行的后续向量指令处理的操作数向量将被 使用持久状态信息选择性地进行混洗。 因此,当多个向量指令需要一个或多个操作数向量的公共自定义单词排序时,可以使用单个持续旋转指令来选择所有向量指令的期望的定制单词排序。

    Method and Apparatus for Implementing a Multiple Operand Vector Floating Point Summation to Scalar Function
    6.
    发明申请
    Method and Apparatus for Implementing a Multiple Operand Vector Floating Point Summation to Scalar Function 失效
    用于实现多操作数向量浮点求和的标量函数的方法和装置

    公开(公告)号:US20090049113A1

    公开(公告)日:2009-02-19

    申请号:US11840277

    申请日:2007-08-17

    IPC分类号: G06F7/38

    摘要: Embodiments of the invention provide methods and apparatus for executing a multiple operand instruction. Executing the multiple operand instruction comprises computing an arithmetic result of a pair of operands in each processing lane of a vector unit. The arithmetic results generated in each processing lane of the vector unit may be transferred to a dot product unit. The dot product unit may compute an arithmetic result using the arithmetic result computed by each processing lane of the vector unit to generate an arithmetic result of more than two operands.

    摘要翻译: 本发明的实施例提供了用于执行多操作数指令的方法和装置。 执行多操作数指令包括​​计算向量单元的每个处理通道中的一对操作数的算术结果。 在矢量单元的每个处理车道中产生的算术结果可以被转移到点积单位。 点积单位可以使用由向量单位的每个处理车道计算的算术结果来计算算术结果,以生成超过两个操作数的算术结果。

    LOW POWER DMA LABELING
    7.
    发明申请
    LOW POWER DMA LABELING 有权
    低功率DMA标签

    公开(公告)号:US20160180493A1

    公开(公告)日:2016-06-23

    申请号:US14574093

    申请日:2014-12-17

    IPC分类号: G06T1/60 H04N13/00

    摘要: Methods for preprocessing pixel data using a Direct Memory Access (DMA) engine during a data transfer of the pixel data from a first memory (e.g., a DRAM) to a second memory (e.g., a local cache) are described. The pixel data may derive from an image capturing device (e.g., a color camera or a depth camera) in which individual pixel values are not a multiple of eight bits. In some embodiments, the DMA engine may perform a variety of image processing operations on the pixel data prior to the pixel data being written into the second memory. In one example, the DMA engine may be configured to identify and label one or more pixels as being within a particular range of pixel values and/or the DMA engine may be configured to label pixels as belonging to one or more pixel groups based on their pixel values.

    摘要翻译: 描述了在将像素数据从第一存储器(例如,DRAM)到第二存储器(例如,本地高速缓存)的数据传输期间使用直接存储器访问(DMA)引擎预处理像素数据的方法。 像素数据可以从其中各个像素值不是8位的倍数的图像捕获设备(例如,彩色照相机或深度相机)导出。 在一些实施例中,DMA引擎可以在将像素数据写入第二存储器之前对像素数据执行各种图像处理操作。 在一个示例中,DMA引擎可以被配置为将一个或多个像素识别并标记为在像素值的特定范围内,和/或DMA引擎可以被配置为基于它们的像素组将像素标记为属于一个或多个像素组 像素值。

    Execution unit with inline pseudorandom number generator
    8.
    发明授权
    Execution unit with inline pseudorandom number generator 有权
    具有内联伪随机数发生器的执行单元

    公开(公告)号:US09021004B2

    公开(公告)日:2015-04-28

    申请号:US13556464

    申请日:2012-07-24

    IPC分类号: G06F7/48 G06F9/38 G06F9/30

    摘要: A circuit arrangement and method couple a hardware-based pseudorandom number generator (PRNG) to an execution unit in such a manner that pseudorandom numbers generated by the PRNG may be selectively output to the execution unit for use as an operand during the execution of instructions by the execution unit. A PRNG may be coupled to an input of an operand multiplexer that outputs to an operand input of an execution unit so that operands provided by instructions supplied to the execution unit are selectively overridden with pseudorandom numbers generated by the PRNG. Furthermore, overridden operands provided by instructions supplied to the execution unit may be used as seed values for the PRNG.

    摘要翻译: 电路布置和方法将基于硬件的伪随机数生成器(PRNG)耦合到执行单元,使得由PRNG生成的伪随机数可以被选择性地输出到执行单元,以在执行指令期间用作操作数, 执行单元。 PRNG可以耦合到操作数多路复用器的输入,该输入输出到执行单元的操作数输入,使得由提供给执行单元的指令提供的操作数被PRNG生成的伪随机数选择性地覆盖。 此外,提供给执行单元的指令提供的覆盖操作数可以用作PRNG的种子值。

    Implementing a floating point weighted average function
    9.
    发明授权
    Implementing a floating point weighted average function 有权
    实现浮点加权平均函数

    公开(公告)号:US08443027B2

    公开(公告)日:2013-05-14

    申请号:US11861518

    申请日:2007-09-26

    IPC分类号: G06F7/38

    CPC分类号: G06F7/483

    摘要: A method, computer-readable medium, and an apparatus for implementing a floating point weighted average function. The method includes receiving an input containing 2N input values, 2N weights, and an opcode, where N is a positive integer number and each of the input values corresponds to one of the weights. Furthermore, the method also includes using existing dot product circuit function to generate 2N addends by multiplying each of the input values with the corresponding weight. In addition, the method includes generating a sum value by adding the 2N addends, where the sum value includes an exponent value, and generating the weighted average value based on the sum value by decreasing the exponent value by N. In this fashion, the same circuit area may be used to carry out both dot product and weighted average calculations, leading to greater circuit area savings and performance advantages.

    摘要翻译: 一种用于实现浮点加权平均函数的方法,计算机可读介质和装置。 该方法包括接收包含2N个输入值,2N个权重和操作码的输入,其中N是正整数,并且每个输入值对应于其中一个权重。 此外,该方法还包括使用现有的点积电路函数,通过将每个输入值与相应的权重相乘来产生2N个加数。 此外,该方法包括通过加上2N加数来产生和值,其中和值包括指数值,并且通过将指数值减小N来基于和值生成加权平均值。以这种方式,相同 电路面积可用于进行点积和加权平均计算,从而实现更大的电路面积节省和性能优势。

    Execution unit with data dependent conditional write instructions
    10.
    发明授权
    Execution unit with data dependent conditional write instructions 有权
    具有数据相关条件写入指令的执行单元

    公开(公告)号:US08356162B2

    公开(公告)日:2013-01-15

    申请号:US12050721

    申请日:2008-03-18

    IPC分类号: G06F7/38 G06F9/00 G06F9/44

    摘要: An execution unit supports data dependent conditional write instructions that write data to a target only when a particular condition is met. In one implementation, a data dependent conditional write instruction identifies a condition as well as data to be tested against that condition. The data is tested against that condition, and the result of the test is used to selectively enable or disable a write to a target associated with the data dependent conditional write instruction. Then, a write is attempted while the write to the target is enabled or disabled such that the write will update the contents of the target only when the write is selectively enabled as a result of the test. By doing so, dependencies are typically avoided, as is use of an architected condition register that might otherwise introduce branch prediction mispredict penalties, enabling improved performance with z-buffer test and similar types of algorithms.

    摘要翻译: 执行单元支持仅当满足特定条件时将数据写入目标的数据相关条件写指令。 在一个实现中,依赖于数据的条件写入指令识别条件以及针对该条件进行测试的数据。 根据该条件测试数据,并且测试结果用于选择性地启用或禁用对与数据相关条件写指令相关联的目标的写入。 然后,当对目标的写入被启用或禁用时,尝试写入,以便只有当作为测试的结果有选择地启用写入时,写入才会更新目标的内容。 通过这样做,通常可以避免依赖关系,因为使用可能会导致分支预测错误处理的架构条件寄存器,可以通过z缓冲区测试和类似类型的算法实现改进的性能。