摘要:
Direct Memory Access (DMA) is used in connection with passing commands between a host device and a target device coupled via a push buffer. Commands passed to a push buffer by a host device may be accumulated by the host device prior to forwarding the commands to the push buffer, such that DMA may be used to collectively pass a block of commands to the push buffer. In addition, a host device may utilize DMA to pass command parameters for commands to a command buffer that is accessible by the target device but is separate from the push buffer, with the commands that are passed to the push buffer including pointers to the associated command parameters in the command buffer.
摘要:
A method includes receiving at a master processing element primitive data that includes properties of a primitive. The method includes partially traversing a spatial data structure that represents a three-dimensional image to identify an internal node of the spatial data structure. The internal node represents a portion of the three-dimensional image. The method also includes selecting a slave processing element from a plurality of slave processing elements. The selected processing element is associated with the internal node. The method further includes sending the primitive data to the selected slave processing element to traverse a portion of the spatial data structure to identify a leaf node of the spatial data structure.
摘要:
A circuit arrangement and method make state changes to shared state data in a highly multithreaded environment by propagating or streaming the changes to multiple parallel hardware threads of execution in the multithreaded environment using an on-chip communications network and without attempting to access any copy of the shared state data in a shared memory to which the parallel threads of execution are also coupled.
摘要:
Frequently accessed state data used in a multithreaded graphics processing architecture is cached within a vector register file of a processing unit to optimize accesses to the state data and minimize memory bus utilization associated therewith. A processing unit may include a fixed point execution unit as well as a vector floating point execution unit, and a vector register file utilized by the vector floating point execution unit may be used to cache state data used by the fixed point execution unit and transferred as needed into the general purpose registers accessible by the fixed point execution unit, thereby reducing the need to repeatedly retrieve and write back the state data from and to an L1 or lower level cache accessed by the fixed point execution unit.
摘要:
An apparatus, program product and method reuse static image data generated during rasterization of static geometry to reduce the processing overhead associated with rasterizing subsequent image frames. In particular, static image data generated one frame may be reused in a subsequent image frame such that the subsequent image frame is generated without having to re-rasterize the static geometry from the scene, i.e., with only the dynamic geometry rasterized. The resulting image frame includes dynamic image data generated as a result of rasterizing the dynamic geometry during that image frame, and static image data generated as a result of rasterizing the static image data during a prior image frame.
摘要:
Performance event triggering through direct interthread communication (‘DITC’) on a network on chip (‘NOC’), the NOC including integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller, where each memory communications controller controlling communications between an IP block and memory, and each network interface controller controlling inter-IP block communications through routers, including enabling performance event monitoring in a selected set of IP blocks distributed throughout the NOC, each IP block within the selected set of IP blocks having one or more event counters; collecting performance results from the one or more event counters; and returning performance results from the one or more event counters to a destination repository, the returning being initiated by a triggering event occurring within the NOC.
摘要:
Frequently accessed state data used in a multithreaded graphics processing architecture is cached within a vector register file of a processing unit to optimize accesses to the state data and minimize memory bus utilization associated therewith. A processing unit may include a fixed point execution unit as well as a vector floating point execution unit, and a vector register file utilized by the vector floating point execution unit may be used to cache state data used by the fixed point execution unit and transferred as needed into the general purpose registers accessible by the fixed point execution unit, thereby reducing the need to repeatedly retrieve and write back the state data from and to an L1 or lower level cache accessed by the fixed point execution unit.
摘要:
A multithreaded rendering software pipeline architecture dynamically reallocates regions of an image space to raster threads based upon performance data collected by the raster threads. The reallocation of the regions typically includes resizing the regions assigned to particular raster threads and/or reassigning regions to different raster threads to better balance the relative workloads of the raster threads.
摘要:
A circuit arrangement and method implement impulse propagation in a multithreaded physics engine by assigning ownership of objects in a scene to individual threads and propagating impulses between objects that are in contact with one another by passing inter-thread impulse messages between the threads that own the contacting objects, while locally propagating impulses through objects using the threads to which such objects are assigned.
摘要:
A circuit arrangement and method utilize predictive load balancing to allocate the workload among hardware threads in a multithreaded physics engine. The predictive load balancing is based at least in part upon the detection of predicted future collisions between objects in a scene, such that the reallocation of respective loads of a plurality of hardware threads may be initiated prior to detection of the actual collisions, thereby increasing the likelihood that hardware threads will be optimally allocated when the actual collisions occur.