Abstract:
Attributes of graphics objects are processed in a plurality of graphics processing pipelines. A streaming multiprocessor (SM) retrieves a first set of parameters associated with a set of graphics objects from a first set of buffers. The SM performs a first set of operations on the first set of parameters according to a first phase of processing to produce a second set of parameters stored in a second set of buffers. The SM performs a second set of operations on the second set of parameters according to a second phase of processing to produce a third set of parameters stored in a third set of buffers. One advantage of the disclosed techniques is that work is redistributed from a first phase to a second phase of graphics processing without having to copy the attributes to and retrieve the attributes from the cache or system memory, resulting in reduced power consumption.
Abstract:
One embodiment of the present invention includes approaches for processing graphics primitives associated with cache tiles when rendering an image. A set of graphics primitives associated with a first render target configuration is received from a first portion of a graphics processing pipeline, and the set of graphics primitives is stored in a memory. A condition is detected indicating that the set of graphics primitives is ready for processing, and a cache tile is selected that intersects at least one graphics primitive in the set of graphics primitives. At least one graphics primitive in the set of graphics primitives that intersects the cache tile is transmitted to a second portion of the graphics processing pipeline for processing. One advantage of the disclosed embodiments is that graphics primitives and associated data are more likely to remain stored on-chip during cache tile rendering, thereby reducing power consumption and improving rendering performance.
Abstract:
One embodiment of the present invention includes a graphics subsystem that includes a tiling unit, a crossbar unit, and a screen-space pipeline. The crossbar unit is configured to transmit primitives interleaved with state change commands to the tiling unit. The tiling unit is configured to record an initial state associated with the primitives and to transmit to the screen-space pipeline one or more primitives in the primitives that overlap a first cache tile. The tiling unit is further configured to transmit the initial state to the screen-space pipeline and to transmit to the screen-space pipeline one or more primitives in the primitives that overlap a second cache tile. The tiling unit includes a state filter block configured to determine that a first state change in the state change commands is followed by a second state change, without an intervening primitive, and to forego transmitting the first state change in response.
Abstract:
Techniques are disclosed for performing memory access operations. A texture unit receives a memory access operation that includes a tuple associated with a first view in a plurality of views. The texture unit retrieves a first hash value associated with a first texture header in a plurality of texture headers, where the first texture header is related to the first view. The texture unit retrieves a second hash value associated with a second texture header in the plurality of texture headers, where the second texture header is related to a second view. The texture unit determines whether the first view is potentially aliased with the second view, based on the first and second hash values. If so, then the texture unit invalidates a cache entry in a cache memory associated with the second texture header. Otherwise, the texture unit maintains the cache entry.
Abstract:
One embodiment of the present invention sets forth a technique for mid-primitive execution preemption. When preemption is initiated no new instructions are issued, in-flight instructions progress to an execution unit boundary, and the execution state is unloaded from the processing pipeline. The execution units within the processing pipeline, including the coarse rasterization unit complete execution of in-flight instructions and become idle. However, rasterization of a triangle may be preempted at a coarse raster region boundary. The amount of context state to be stored is reduced because the execution units are idle. Preempting at the mid-primitive level during rasterization reduces the time from when preemption is initiated to when another process can execute because the entire triangle is not rasterized.
Abstract:
One embodiment of the present invention includes a method for generating accumulated bounding boxes for graphics primitives. The method includes generating a first bounding box associated with a first graphics primitive. The method further includes, for each graphics primitive included in a first set of one or more additional graphics primitives, determining that the graphics primitive is within a threshold distance of the first bounding box, and adding the graphics primitive to the first bounding box. The method further includes determining not to add a second graphics primitive to the first bounding box. The method further includes generating a second bounding box associated with the second graphics primitive. Finally, the method includes transmitting the first bounding box to a tiling unit via a crossbar. One advantage of the disclosed embodiments is that multiple bounding boxes are combined to generate an accumulated bounding box that is then transferred across the crossbar.
Abstract:
One embodiment of the present invention includes a technique for processing graphics primitives in a tile-based architecture. The technique includes storing, in a buffer, a first plurality of graphics primitives and a first plurality of state bundles received from the world-space pipeline. The technique further includes determining, based on a first condition, that the first plurality of graphics primitives should be replayed from the buffer, and, in response, replaying the first plurality of graphics primitives against a first tile included in a first plurality of tiles. Replaying the first plurality of graphics primitives includes comparing each graphics primitive against the first tile to determine whether the graphics primitive intersects the first tile, determining that one or more graphics primitives intersects the first tile, and transmitting the one or more graphics primitives and one or more associated state bundles to a screen-space pipeline for processing.
Abstract:
One embodiment of the present invention sets forth a technique for managing buffer table entries in a tile-based architecture. The technique includes binding a plurality of shader registers to a buffer table entry. The technique further includes processing at least one tile by reading a buffer table index stored in the shader register to access the buffer table entry, reading a buffer address stored in the buffer table entry, accessing data associated with the buffer address, and unbinding the shader register from the buffer table entry. The technique further includes determining that none of the shader registers is still bound to the buffer table entry and, in response, causing a release packet to be inserted into an instruction stream. The technique further includes determining that a last tile has been processed and, in response, transmitting the release packet to cause the buffer table entry to be released.
Abstract:
One embodiment of the present invention sets forth a technique for managing graphics processing resources in a tile-based architecture. The technique includes storing a release packet associated with a graphics processing resource in a buffer and initiating a replay of graphics primitives stored in the buffer and associated with the graphics processing resource. The technique further includes, for each tile included in a plurality of tiles and processed during the replay, reading the release packet and determining whether the tile is a last tile processed during the replay. The technique further includes determining not to transmit the release packet to a screen-space pipeline and continuing to read graphics data stored in the buffer if the tile is not the last tile to be processed during the replay, or transmitting the release packet to the screen-space pipeline if the tile is the last tile to be processed during the replay.
Abstract:
One embodiment of the present invention includes a technique for processing graphics primitives in a tile-based architecture. The technique includes storing, in a buffer, a first plurality of graphics primitives and a first plurality of state bundles received from a world-space pipeline, and transmitting the first plurality of graphics primitives to a screen-space pipeline for processing while a tiling function is enabled. The technique further includes storing, in the buffer, a second plurality of graphics primitives and a second plurality of state bundles received from the world-space pipeline. The technique further includes determining, based on a first condition, that the tiling function should be disabled and that the second plurality of graphics primitives should be flushed from the buffer, and transmitting the second plurality of graphics primitives to the screen-space pipeline for processing while the tiling function is disabled.