DMA-based acceleration of command push buffer between host and target devices
    1.
    发明授权
    DMA-based acceleration of command push buffer between host and target devices 失效
    主机和目标设备之间基于DMA的加速命令推送缓冲区

    公开(公告)号:US08719455B2

    公开(公告)日:2014-05-06

    申请号:US12824674

    申请日:2010-06-28

    IPC分类号: G06F3/00 G06F13/28

    CPC分类号: G06F13/28

    摘要: Direct Memory Access (DMA) is used in connection with passing commands between a host device and a target device coupled via a push buffer. Commands passed to a push buffer by a host device may be accumulated by the host device prior to forwarding the commands to the push buffer, such that DMA may be used to collectively pass a block of commands to the push buffer. In addition, a host device may utilize DMA to pass command parameters for commands to a command buffer that is accessible by the target device but is separate from the push buffer, with the commands that are passed to the push buffer including pointers to the associated command parameters in the command buffer.

    摘要翻译: 直接存储器访问(DMA)用于在通过推送缓冲器耦合的主机设备和目标设备之间传递命令。 由宿主设备传递到推送缓冲器的命令可以在将命令转发到推送缓冲器之前被主机设备累积,使得可以使用DMA来共同地将一组命令传递给推送缓冲器。 此外,主机设备可以利用DMA将用于命令的命令参数传递给目标设备可访问但与推送缓冲区分离的命令缓冲区,其中传递到推送缓冲器的命令包括指向相关命令的指针 命令缓冲区中的参数。

    Parallelized streaming accelerated data structure generation
    2.
    发明授权
    Parallelized streaming accelerated data structure generation 失效
    并行流加速数据结构生成

    公开(公告)号:US08692825B2

    公开(公告)日:2014-04-08

    申请号:US12822427

    申请日:2010-06-24

    IPC分类号: G09G5/00

    摘要: A method includes receiving at a master processing element primitive data that includes properties of a primitive. The method includes partially traversing a spatial data structure that represents a three-dimensional image to identify an internal node of the spatial data structure. The internal node represents a portion of the three-dimensional image. The method also includes selecting a slave processing element from a plurality of slave processing elements. The selected processing element is associated with the internal node. The method further includes sending the primitive data to the selected slave processing element to traverse a portion of the spatial data structure to identify a leaf node of the spatial data structure.

    摘要翻译: 一种方法包括在主处理元件处接收包括原语的属性的原始数据。 该方法包括部分地遍历表示三维图像以识别空间数据结构的内部节点的空间数据结构。 内部节点表示三维图像的一部分。 该方法还包括从多个从属处理元件中选择从属处理元件。 所选择的处理元件与内部节点相关联。 该方法还包括将原始数据发送到所选择的从属处理元件以遍历空间数据结构的一部分以识别空间数据结构的叶节点。

    VECTOR REGISTER FILE CACHING OF CONTEXT DATA STRUCTURE FOR MAINTAINING STATE DATA IN A MULTITHREADED IMAGE PROCESSING PIPELINE
    4.
    发明申请
    VECTOR REGISTER FILE CACHING OF CONTEXT DATA STRUCTURE FOR MAINTAINING STATE DATA IN A MULTITHREADED IMAGE PROCESSING PIPELINE 有权
    用于维护多图像处理管道中状态数据的上下文数据结构的矢量寄存器文件

    公开(公告)号:US20130044117A1

    公开(公告)日:2013-02-21

    申请号:US13212418

    申请日:2011-08-18

    IPC分类号: G06T1/20 G06F9/02 G06F15/76

    摘要: Frequently accessed state data used in a multithreaded graphics processing architecture is cached within a vector register file of a processing unit to optimize accesses to the state data and minimize memory bus utilization associated therewith. A processing unit may include a fixed point execution unit as well as a vector floating point execution unit, and a vector register file utilized by the vector floating point execution unit may be used to cache state data used by the fixed point execution unit and transferred as needed into the general purpose registers accessible by the fixed point execution unit, thereby reducing the need to repeatedly retrieve and write back the state data from and to an L1 or lower level cache accessed by the fixed point execution unit.

    摘要翻译: 在多线程图形处理架构中使用的经常访问的状态数据被缓存在处理单元的向量寄存器文件中,以优化对状态数据的访问并最小化与其相关联的存储器总线利用。 处理单元可以包括固定点执行单元以及向量浮点执行单元,并且向量浮点执行单元使用的向量寄存器文件可用于对由固定点执行单元使用的状态数据进行缓存并转移为 需要进入由固定点执行单元访问的通用寄存器,从而减少了从固定点执行单元访问的L1或更低级高速缓存重复检索和回写状态数据的需要。

    REUSE OF STATIC IMAGE DATA FROM PRIOR IMAGE FRAMES TO REDUCE RASTERIZATION REQUIREMENTS
    5.
    发明申请
    REUSE OF STATIC IMAGE DATA FROM PRIOR IMAGE FRAMES TO REDUCE RASTERIZATION REQUIREMENTS 失效
    从先前的图像框架中减少静态图像数据以减少放射性要求

    公开(公告)号:US20120176364A1

    公开(公告)日:2012-07-12

    申请号:US12985607

    申请日:2011-01-06

    IPC分类号: G06T15/00

    摘要: An apparatus, program product and method reuse static image data generated during rasterization of static geometry to reduce the processing overhead associated with rasterizing subsequent image frames. In particular, static image data generated one frame may be reused in a subsequent image frame such that the subsequent image frame is generated without having to re-rasterize the static geometry from the scene, i.e., with only the dynamic geometry rasterized. The resulting image frame includes dynamic image data generated as a result of rasterizing the dynamic geometry during that image frame, and static image data generated as a result of rasterizing the static image data during a prior image frame.

    摘要翻译: 一种装置,程序产品和方法重用在静态几何的光栅化期间产生的静态图像数据,以减少与后续图像帧的光栅化相关联的处理开销。 特别地,生成一帧的静态图像数据可以在随后的图像帧中重新使用,使得生成后续图像帧,而不必从场景重新光栅化静态几何,即仅光栅化动态几何。 所得到的图像帧包括作为在该图像帧期间光栅化动态几何结果而生成的动态图像数据,以及作为在先前图像帧期间对静态图像数据进行光栅化而产生的静态图像数据。

    Performance Event Triggering Through Direct Interthread Communication On a Network On Chip
    6.
    发明申请
    Performance Event Triggering Through Direct Interthread Communication On a Network On Chip 失效
    通过芯片上网络直接通信的性能事件触发

    公开(公告)号:US20100269123A1

    公开(公告)日:2010-10-21

    申请号:US12427090

    申请日:2009-04-21

    IPC分类号: G06F9/54

    CPC分类号: H04L43/0817

    摘要: Performance event triggering through direct interthread communication (‘DITC’) on a network on chip (‘NOC’), the NOC including integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller, where each memory communications controller controlling communications between an IP block and memory, and each network interface controller controlling inter-IP block communications through routers, including enabling performance event monitoring in a selected set of IP blocks distributed throughout the NOC, each IP block within the selected set of IP blocks having one or more event counters; collecting performance results from the one or more event counters; and returning performance results from the one or more event counters to a destination repository, the returning being initiated by a triggering event occurring within the NOC.

    摘要翻译: 通过芯片上的直接线间通信(“DITC”)触发的性能事件,NOC包括集成处理器(“IP”)块,路由器,存储器通信控制器和网络接口控制器,每个IP块 通过存储器通信控制器和网络接口控制器适配于路由器,其中每个存储器通信控制器控制IP块和存储器之间的通信,以及控制通过路由器进行IP间块通信的每个网络接口控制器,包括在 分配在整个NOC上的所选择的一组IP块,所选择的一组IP块中的每个IP块具有一个或多个事件计数器; 从一个或多个事件计数器收集性能结果; 并将性能结果从一个或多个事件计数器返回到目的地存储库,返回由在NOC内发生的触发事件发起。

    Instruction buffer bypass of target instruction in response to partial flush
    7.
    发明授权
    Instruction buffer bypass of target instruction in response to partial flush 有权
    指令缓冲区绕目标指令响应部分刷新

    公开(公告)号:US09354887B2

    公开(公告)日:2016-05-31

    申请号:US12824812

    申请日:2010-06-28

    IPC分类号: G06F9/38 G06F12/08

    摘要: A circuit arrangement and method selectively bypass an instruction buffer for selected instructions so that bypassed instructions can be dispatched without having to first pass through the instruction buffer. Thus, for example, in the case that an instruction buffer is partially or completely flushed as a result of an instruction redirect (e.g., due to a branch mispredict), instructions can be forwarded to subsequent stages in an instruction unit and/or to one or more execution units without the latency associated with passing through the instruction buffer.

    摘要翻译: 电路装置和方法选择性地旁路用于所选指令的指令缓冲器,使得可以调度旁路指令而不必首先通过指令缓冲器。 因此,例如,在指令重定向(例如,由于分支错误预测)导致指令缓冲器被部分或全部冲洗的情况下,可以将指令转发到指令单元中的后续阶段和/或向一个 或更多的执行单元,而没有与通过指令缓冲器相关联的延迟。

    Recovering data from a plurality of packets
    8.
    发明授权
    Recovering data from a plurality of packets 失效
    从多个分组中恢复数据

    公开(公告)号:US08363669B2

    公开(公告)日:2013-01-29

    申请号:US12823689

    申请日:2010-06-25

    IPC分类号: H04L12/56

    CPC分类号: H04L49/9047

    摘要: A method includes receiving a plurality of packets at an integrated processor block of a network on a chip device. The plurality of packets includes a first packet that includes an indication of a start of data associated with a pixel shader application. The method includes recovering the data from the plurality of packets. The method also includes storing the recovered data in a dedicated packet collection memory within the network on the chip device. The method further includes retaining the data stored in the dedicated packet collection memory during an interruption event. Upon completion of the interruption event, the method includes copying packets stored in the dedicated packet collection memory prior to the interruption event to an inbox of the network on the chip device for processing.

    摘要翻译: 一种方法包括在芯片设备上的网络的集成处理器块处接收多个分组。 多个分组包括包括与像素着色器应用相关联的数据开始的指示的第一分组。 该方法包括从多个分组中恢复数据。 该方法还包括将恢复的数据存储在芯片设备上的网络内的专用分组收集存储器中。 该方法还包括在中断事件期间保留存储在专用分组收集存储器中的数据。 在中断事件完成时,该方法包括在中断事件之前将存储在专用分组收集存储器中的分组复制到芯片装置上的网络的收件箱进行处理。

    Vector register file caching of context data structure for maintaining state data in a multithreaded image processing pipeline
    9.
    发明授权
    Vector register file caching of context data structure for maintaining state data in a multithreaded image processing pipeline 有权
    用于在多线程图像处理管道中维护状态数据的上下文数据结构的向量寄存器文件缓存

    公开(公告)号:US08836709B2

    公开(公告)日:2014-09-16

    申请号:US13212418

    申请日:2011-08-18

    摘要: Frequently accessed state data used in a multithreaded graphics processing architecture is cached within a vector register file of a processing unit to optimize accesses to the state data and minimize memory bus utilization associated therewith. A processing unit may include a fixed point execution unit as well as a vector floating point execution unit, and a vector register file utilized by the vector floating point execution unit may be used to cache state data used by the fixed point execution unit and transferred as needed into the general purpose registers accessible by the fixed point execution unit, thereby reducing the need to repeatedly retrieve and write back the state data from and to an L1 or lower level cache accessed by the fixed point execution unit.

    摘要翻译: 在多线程图形处理架构中使用的经常访问的状态数据被缓存在处理单元的向量寄存器文件中,以优化对状态数据的访问并最小化与其相关联的存储器总线利用。 处理单元可以包括固定点执行单元以及向量浮点执行单元,并且向量浮点执行单元使用的向量寄存器文件可用于对由固定点执行单元使用的状态数据进行缓存并转移为 需要进入由固定点执行单元访问的通用寄存器,从而减少了从固定点执行单元访问的L1或更低级高速缓存重复检索和回写状态数据的需要。