摘要:
Direct Memory Access (DMA) is used in connection with passing commands between a host device and a target device coupled via a push buffer. Commands passed to a push buffer by a host device may be accumulated by the host device prior to forwarding the commands to the push buffer, such that DMA may be used to collectively pass a block of commands to the push buffer. In addition, a host device may utilize DMA to pass command parameters for commands to a command buffer that is accessible by the target device but is separate from the push buffer, with the commands that are passed to the push buffer including pointers to the associated command parameters in the command buffer.
摘要:
A method includes receiving at a master processing element primitive data that includes properties of a primitive. The method includes partially traversing a spatial data structure that represents a three-dimensional image to identify an internal node of the spatial data structure. The internal node represents a portion of the three-dimensional image. The method also includes selecting a slave processing element from a plurality of slave processing elements. The selected processing element is associated with the internal node. The method further includes sending the primitive data to the selected slave processing element to traverse a portion of the spatial data structure to identify a leaf node of the spatial data structure.
摘要:
A circuit arrangement and method make state changes to shared state data in a highly multithreaded environment by propagating or streaming the changes to multiple parallel hardware threads of execution in the multithreaded environment using an on-chip communications network and without attempting to access any copy of the shared state data in a shared memory to which the parallel threads of execution are also coupled.
摘要:
Frequently accessed state data used in a multithreaded graphics processing architecture is cached within a vector register file of a processing unit to optimize accesses to the state data and minimize memory bus utilization associated therewith. A processing unit may include a fixed point execution unit as well as a vector floating point execution unit, and a vector register file utilized by the vector floating point execution unit may be used to cache state data used by the fixed point execution unit and transferred as needed into the general purpose registers accessible by the fixed point execution unit, thereby reducing the need to repeatedly retrieve and write back the state data from and to an L1 or lower level cache accessed by the fixed point execution unit.
摘要:
An apparatus, program product and method reuse static image data generated during rasterization of static geometry to reduce the processing overhead associated with rasterizing subsequent image frames. In particular, static image data generated one frame may be reused in a subsequent image frame such that the subsequent image frame is generated without having to re-rasterize the static geometry from the scene, i.e., with only the dynamic geometry rasterized. The resulting image frame includes dynamic image data generated as a result of rasterizing the dynamic geometry during that image frame, and static image data generated as a result of rasterizing the static image data during a prior image frame.
摘要:
Performance event triggering through direct interthread communication (‘DITC’) on a network on chip (‘NOC’), the NOC including integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller, where each memory communications controller controlling communications between an IP block and memory, and each network interface controller controlling inter-IP block communications through routers, including enabling performance event monitoring in a selected set of IP blocks distributed throughout the NOC, each IP block within the selected set of IP blocks having one or more event counters; collecting performance results from the one or more event counters; and returning performance results from the one or more event counters to a destination repository, the returning being initiated by a triggering event occurring within the NOC.
摘要:
A circuit arrangement and method selectively bypass an instruction buffer for selected instructions so that bypassed instructions can be dispatched without having to first pass through the instruction buffer. Thus, for example, in the case that an instruction buffer is partially or completely flushed as a result of an instruction redirect (e.g., due to a branch mispredict), instructions can be forwarded to subsequent stages in an instruction unit and/or to one or more execution units without the latency associated with passing through the instruction buffer.
摘要:
A method includes receiving a plurality of packets at an integrated processor block of a network on a chip device. The plurality of packets includes a first packet that includes an indication of a start of data associated with a pixel shader application. The method includes recovering the data from the plurality of packets. The method also includes storing the recovered data in a dedicated packet collection memory within the network on the chip device. The method further includes retaining the data stored in the dedicated packet collection memory during an interruption event. Upon completion of the interruption event, the method includes copying packets stored in the dedicated packet collection memory prior to the interruption event to an inbox of the network on the chip device for processing.
摘要:
Frequently accessed state data used in a multithreaded graphics processing architecture is cached within a vector register file of a processing unit to optimize accesses to the state data and minimize memory bus utilization associated therewith. A processing unit may include a fixed point execution unit as well as a vector floating point execution unit, and a vector register file utilized by the vector floating point execution unit may be used to cache state data used by the fixed point execution unit and transferred as needed into the general purpose registers accessible by the fixed point execution unit, thereby reducing the need to repeatedly retrieve and write back the state data from and to an L1 or lower level cache accessed by the fixed point execution unit.
摘要:
A multithreaded rendering software pipeline architecture dynamically reallocates regions of an image space to raster threads based upon performance data collected by the raster threads. The reallocation of the regions typically includes resizing the regions assigned to particular raster threads and/or reassigning regions to different raster threads to better balance the relative workloads of the raster threads.