Processing unit incorporating multirate execution unit
    31.
    发明授权
    Processing unit incorporating multirate execution unit 失效
    包含多速率执行单元的处理单元

    公开(公告)号:US07945764B2

    公开(公告)日:2011-05-17

    申请号:US11972746

    申请日:2008-01-11

    Abstract: A multirate execution unit is capable of being operated in a plurality of modes, with the execution unit being capable of clocked at multiple different rates relative to a multithreaded issue unit such that, in applications where maximum performance is desired, the execution unit can be clocked at a rate that is faster than the clock rate for the multithreaded issue unit, and in applications where a lower power profile is desired, the execution unit can be throttled back to a slower rate to reduce the power consumption of the execution unit. When the execution unit is clocked at a faster rate than the multithreaded issue unit, the issue unit is permitted to issue more instructions per cycle than when the execution unit is throttled to the slower rate to increase overall instruction throughput.

    Abstract translation: 多速率执行单元能够以多种模式操作,其中执行单元能够以相对于多线程发布单元的多个不同速率进行计时,使得在需要最大性能的应用中,执行单元可被计时 以比多线程发布单元的时钟速率快的速率,以及在需要较低功率配置的应用中,执行单元可以被限制回到较慢的速率以降低执行单元的功耗。 当执行单元以比多线程发布单元更快的速度进行计时时,允许发布单元每循环发出更多指令,而不是执行单元被限制到较慢的速率以增加总体指令吞吐量。

    Dual independent and shared resource vector execution units with shared register file
    32.
    发明授权
    Dual independent and shared resource vector execution units with shared register file 有权
    具有共享寄存器文件的双独立和共享资源向量执行单元

    公开(公告)号:US07926009B2

    公开(公告)日:2011-04-12

    申请号:US11924980

    申请日:2007-10-26

    CPC classification number: G06T1/20 G06F15/8092 G06T15/005

    Abstract: The present invention is generally related to integrated circuit devices, and more particularly, to methods, systems and design structures for the field of image processing, and more specifically to vector units for supporting image processing. A dual vector unit implementation is described wherein two vector units are configured receive data from a common register file. The vector units may independently and simultaneously process instructions. Furthermore, the vector units may be adapted to perform scalar operations thereby integrating the vector and scalar processing. The vector units may also be configured to share resources to perform an operation, for example, a cross product operation.

    Abstract translation: 本发明通常涉及集成电路装置,更具体地涉及图像处理领域的方法,系统和设计结构,更具体地涉及用于支持图像处理的矢量单元。 描述了双向量单元实现,其中配置了两个向量单元从公共寄存器文件接收数据。 向量单元可以独立地并且同时处理指令。 此外,矢量单元可以适于执行标量运算,从而整合向量和标量处理。 矢量单元还可以被配置为共享资源以执行操作,例如交叉产品操作。

    Reallocation of spatial index traversal between processing elements in response to changes in ray tracing graphics workload
    33.
    发明授权
    Reallocation of spatial index traversal between processing elements in response to changes in ray tracing graphics workload 失效
    响应于光线跟踪图形工作负载的变化,重新分配处理元素之间的空间索引遍历

    公开(公告)号:US07737974B2

    公开(公告)日:2010-06-15

    申请号:US11535573

    申请日:2006-09-27

    CPC classification number: G06T15/06 G06T17/005

    Abstract: Embodiments of the invention provide methods and apparatus for reallocating workload related to traversal of a ray through a spatial index. In a first operating state a workload manager may be experiencing a first or a normal workload. In the first operating state the workload manager may be responsible for traversing the entire spatial index and a vector throughput engine may be responsible for performing ray-primitive intersection tests. In an increased workload state the workload manager may experience an increased workload. In response to the increased workload the image processing system may partition the spatial index such that the workload manager may be responsible for traversing a first portion of the spatial index and the vector throughput engine may be responsible for traversing a second portion of the spatial index and for performing ray-primitive intersection tests.

    Abstract translation: 本发明的实施例提供了用于重新分配通过空间索引穿过射线的工作量的方法和装置。 在第一个操作状态下,工作负载管理器可能正在经历第一个或正常工作负载。 在第一个操作状态下,工作负载管理器可能负责遍历整个空间索引,而矢量吞吐量引擎可能负责执行光线原始相交测试。 在增加的工作负载状态下,工作负载管理器可能会遇到增加的工作负载。 响应于增加的工作负载,图像处理系统可以分割空间索引,使得工作负载管理器可能负责遍历空间索引的第一部分,并且向量吞吐量引擎可以负责遍历空间索引的第二部分, 用于执行光线原始相交测试。

    Processing Unit Incorporating Instruction-Based Persistent Vector Multiplexer Control
    34.
    发明申请
    Processing Unit Incorporating Instruction-Based Persistent Vector Multiplexer Control 失效
    结合基于指令的持续矢量多路复用器控制的处理单元

    公开(公告)号:US20090228681A1

    公开(公告)日:2009-09-10

    申请号:US12045221

    申请日:2008-03-10

    CPC classification number: G06F9/30032 G06F9/30036 G06F9/30109 G06F9/30123

    Abstract: Persistent vector multiplexer control is used in a vector-based execution unit to control the shuffling of words in operand vectors processed by the execution unit. In addition, a persistent swizzle instruction is defined in an instruction set for the vector-based execution unit and is used to cause state information to be persisted such that the operand vectors processed by subsequent vector instructions executed by the vector-based execution unit will be selectively shuffled using the persisted state information. As a result, when multiple vector instructions require a common custom word ordering for one or more operand vectors, a single persistent swizzle instruction may be used to select the desired custom word ordering for all of the vector instructions.

    Abstract translation: 持续矢量复用器控制在基于矢量的执行单元中用于控制由执行单元处理的操作数向量中的字的混洗。 此外,在用于基于向量的执行单元的指令集中定义持续转换指令,并且用于使状态信息被持久化,使得由基于向​​量的执行单元执行的后续向量指令处理的操作数向量将被 使用持久状态信息选择性地进行混洗。 因此,当多个向量指令需要一个或多个操作数向量的公共自定义单词排序时,可以使用单个持续旋转指令来选择所有向量指令的期望的定制单词排序。

    Method and Apparatus for Implementing a Multiple Operand Vector Floating Point Summation to Scalar Function
    35.
    发明申请
    Method and Apparatus for Implementing a Multiple Operand Vector Floating Point Summation to Scalar Function 失效
    用于实现多操作数向量浮点求和的标量函数的方法和装置

    公开(公告)号:US20090049113A1

    公开(公告)日:2009-02-19

    申请号:US11840277

    申请日:2007-08-17

    Abstract: Embodiments of the invention provide methods and apparatus for executing a multiple operand instruction. Executing the multiple operand instruction comprises computing an arithmetic result of a pair of operands in each processing lane of a vector unit. The arithmetic results generated in each processing lane of the vector unit may be transferred to a dot product unit. The dot product unit may compute an arithmetic result using the arithmetic result computed by each processing lane of the vector unit to generate an arithmetic result of more than two operands.

    Abstract translation: 本发明的实施例提供了用于执行多操作数指令的方法和装置。 执行多操作数指令包括​​计算向量单元的每个处理通道中的一对操作数的算术结果。 在矢量单元的每个处理车道中产生的算术结果可以被转移到点积单位。 点积单位可以使用由向量单位的每个处理车道计算的算术结果来计算算术结果,以生成超过两个操作数的算术结果。

    "> Single Precision Vector Permute Immediate with
    36.
    发明申请
    Single Precision Vector Permute Immediate with "Word" Vector Write Mask 有权
    单精度向量允许立即与“Word”向量写入掩码

    公开(公告)号:US20080114824A1

    公开(公告)日:2008-05-15

    申请号:US11554794

    申请日:2006-10-31

    CPC classification number: G06T1/60 G06F9/30032 G06F9/30036

    Abstract: The present invention is generally related to the field of image processing, and more specifically to an instruction set for processing images. Vector processing may involve performing a plurality of permute operations to arrange vector operands in desired locations of a register prior to performing vector operation, for example, a cross product. The permute instructions may be dependent on one another and may require the use of temporary registers. Embodiments of the invention provide a permute instruction wherein a mask field may be used to specify a particular location of a target register in which to transfer data, thereby reducing the number of instructions for arranging data, reducing dependencies between instructions, and the usage of temporary registers.

    Abstract translation: 本发明通常涉及图像处理领域,更具体地涉及用于处理图像的指令集。 矢量处理可以包括执行多个置换操作,以在执行矢量操作之前(例如,交叉乘积)将向量操作数布置在寄存器的期望位置中。 置换指令可能彼此依赖,可能需要使用临时寄存器。 本发明的实施例提供了一种置换指令,其中掩模字段可以用于指定目标寄存器的特定位置,其中传送数据,从而减少用于排列数据的指令的数量,减少指令之间的依赖性以及临时的使用 注册

    Single precision vector permute immediate with “word” vector write mask
    38.
    发明授权
    Single precision vector permute immediate with “word” vector write mask 有权
    单精度矢量立即与“字”向量写入掩码

    公开(公告)号:US09495724B2

    公开(公告)日:2016-11-15

    申请号:US11554794

    申请日:2006-10-31

    CPC classification number: G06T1/60 G06F9/30032 G06F9/30036

    Abstract: The present invention is generally related to the field of image processing, and more specifically to an instruction set for processing images. Vector processing may involve performing a plurality of permute operations to arrange vector operands in desired locations of a register prior to performing vector operation, for example, a cross product. The permute instructions may be dependent on one another and may require the use of temporary registers. Embodiments of the invention provide a permute instruction wherein a mask field may be used to specify a particular location of a target register in which to transfer data, thereby reducing the number of instructions for arranging data, reducing dependencies between instructions, and the usage of temporary registers.

    Abstract translation: 本发明通常涉及图像处理领域,更具体地涉及用于处理图像的指令集。 矢量处理可以包括执行多个置换操作,以在执行矢量操作之前(例如,交叉乘积)将向量操作数布置在寄存器的期望位置中。 置换指令可能彼此依赖,可能需要使用临时寄存器。 本发明的实施例提供了一种置换指令,其中掩模字段可以用于指定目标寄存器的特定位置,其中传送数据,从而减少用于排列数据的指令的数量,减少指令之间的依赖性以及临时的使用 注册

    LOW POWER DMA SNOOP AND SKIP
    39.
    发明申请
    LOW POWER DMA SNOOP AND SKIP 审中-公开
    低功耗DMA SNOOP和SKIP

    公开(公告)号:US20160180494A1

    公开(公告)日:2016-06-23

    申请号:US14574100

    申请日:2014-12-17

    Abstract: Methods for preprocessing pixel data using a Direct Memory Access (DMA) engine during a data transfer of the pixel data from a first memory (e.g., a DRAM) to a second memory (e.g., an SRAM) are described. The pixel data may derive from a color camera or a depth camera in which individual pixel values are not a multiple of eight bits. In some cases, the DMA engine may perform a variety of image processing operations on the pixel data prior to the pixel data being written into the second memory. In one embodiment, the DMA engine may be configured to determine whether one or more pixels corresponding with the pixel data may be invalidated or skipped based on a minimum pixel value threshold and a maximum pixel value threshold and to embed pixel skipping information within unused bits of the pixel data.

    Abstract translation: 描述了在从第一存储器(例如,DRAM)到第二存储器(例如,SRAM)的像素数据的数据传输期间使用直接存储器访问(DMA)引擎来预处理像素数据的方法。 像素数据可以从彩色相机或深度相机中得出,其中各个像素值不是8位的倍数。 在某些情况下,DMA引擎可以在将像素数据写入第二存储器之前对像素数据执行各种图像处理操作。 在一个实施例中,DMA引擎可以被配置为基于最小像素值阈值和最大像素值阈值来确定与像素数据相对应的一个或多个像素可能被无效或跳过,并且将像素跳过信息嵌入到未使用的位内 像素数据。

    Single precision vector dot product with “word” vector write mask
    40.
    发明授权
    Single precision vector dot product with “word” vector write mask 失效
    具有单词向量写入掩码的单精度矢量点积

    公开(公告)号:US08332452B2

    公开(公告)日:2012-12-11

    申请号:US11554774

    申请日:2006-10-31

    CPC classification number: G06F17/16

    Abstract: The present invention is generally related to the field of image processing, and more specifically to an instruction set for processing images. Vector processing may involve performing a plurality of dot product operations to generate operands for generating operands for a new vector. The dot product operations may require the issue of a plurality of permute instructions to arrange the vector operands in desired locations of a target register. Embodiments of the invention provide a dot product instruction wherein a mask field may be used to specify a particular location of a target register in which to transfer data, thereby avoiding the need for permute instructions for arranging data, reducing dependencies between instructions, and the usage of temporary registers.

    Abstract translation: 本发明通常涉及图像处理领域,更具体地涉及用于处理图像的指令集。 矢量处理可以涉及执行多个点积运算以产生用于生成新向量的操作数的操作数。 点产品操作可能需要发出多个置换指令以将向量操作数布置在目标寄存器的期望位置中。 本发明的实施例提供一种点积指令,其中掩模字段可用于指定在其中传送数据的目标寄存器的特定位置,从而避免需要用于排列数据的置换指令,减少指令之间的依赖关系和使用 的临时寄存器。

Patent Agency Ranking