Abstract:
One embodiment of the present invention includes a graphics subsystem that includes a tiling unit, a crossbar unit, and a screen-space pipeline. The crossbar unit is configured to transmit primitives interleaved with state change commands to the tiling unit. The tiling unit is configured to record an initial state associated with the primitives and to transmit to the screen-space pipeline one or more primitives in the primitives that overlap a first cache tile. The tiling unit is further configured to transmit the initial state to the screen-space pipeline and to transmit to the screen-space pipeline one or more primitives in the primitives that overlap a second cache tile. The tiling unit includes a state filter block configured to determine that a first state change in the state change commands is followed by a second state change, without an intervening primitive, and to forego transmitting the first state change in response.
Abstract:
One embodiment of the present invention sets forth a technique for managing graphics processing resources in a tile-based architecture. The technique includes storing a release packet associated with a graphics processing resource in a buffer and initiating a replay of graphics primitives stored in the buffer and associated with the graphics processing resource. The technique further includes, for each tile included in a plurality of tiles and processed during the replay, reading the release packet and determining whether the tile is a last tile processed during the replay. The technique further includes determining not to transmit the release packet to a screen-space pipeline and continuing to read graphics data stored in the buffer if the tile is not the last tile to be processed during the replay, or transmitting the release packet to the screen-space pipeline if the tile is the last tile to be processed during the replay.
Abstract:
One embodiment of the present invention sets forth a technique for managing buffer entries in a tile-based architecture. The technique includes receiving a first plurality of graphics primitives and a first buffer address at which attributes associated with the first plurality of graphics primitives are stored. The technique further includes, for each tile included in a plurality of tiles, transmitting the first plurality of graphics primitives and the first buffer address to a screen space pipeline and receiving an acknowledgement from the screen space pipeline indicating that processing the first plurality of graphics primitives has completed. The technique further includes determining that processing the first plurality of graphics primitives has completed for a last tile included in the plurality of tiles and that the acknowledgement has been received for each tile included in the plurality of tiles, and, in response, releasing a buffer entry associated with the first buffer address.
Abstract:
One embodiment of the present invention includes a technique for processing graphics primitives in a tile-based architecture. The technique includes storing, in a buffer, a first plurality of graphics primitives and a first plurality of state bundles received from the world-space pipeline. The technique further includes determining, based on a first condition, that the first plurality of graphics primitives should be replayed from the buffer, and, in response, replaying the first plurality of graphics primitives against a first tile included in a first plurality of tiles. Replaying the first plurality of graphics primitives includes comparing each graphics primitive against the first tile to determine whether the graphics primitive intersects the first tile, determining that one or more graphics primitives intersects the first tile, and transmitting the one or more graphics primitives and one or more associated state bundles to a screen-space pipeline for processing.
Abstract:
One embodiment of the present invention sets forth a technique for managing buffer table entries in a tile-based architecture. The technique includes binding a plurality of shader registers to a buffer table entry. The technique further includes processing at least one tile by reading a buffer table index stored in the shader register to access the buffer table entry, reading a buffer address stored in the buffer table entry, accessing data associated with the buffer address, and unbinding the shader register from the buffer table entry. The technique further includes determining that none of the shader registers is still bound to the buffer table entry and, in response, causing a release packet to be inserted into an instruction stream. The technique further includes determining that a last tile has been processed and, in response, transmitting the release packet to cause the buffer table entry to be released.
Abstract:
One embodiment of the present invention includes a technique for processing graphics primitives in a tile-based architecture. The technique includes storing, in a buffer, a first plurality of graphics primitives and a first plurality of state bundles received from a world-space pipeline, and transmitting the first plurality of graphics primitives to a screen-space pipeline for processing while a tiling function is enabled. The technique further includes storing, in the buffer, a second plurality of graphics primitives and a second plurality of state bundles received from the world-space pipeline. The technique further includes determining, based on a first condition, that the tiling function should be disabled and that the second plurality of graphics primitives should be flushed from the buffer, and transmitting the second plurality of graphics primitives to the screen-space pipeline for processing while the tiling function is disabled.
Abstract:
One embodiment of the present invention sets forth a graphics subsystem configured to implement distributed cache tiling. The graphics subsystem includes one or more world-space pipelines, one or more screen-space pipelines, one or more tiling units, and a crossbar unit. Each world-space pipeline is implemented in a different processing entity and is coupled to a different tiling unit. Each screen-space pipeline is implemented in a different processing entity and is coupled to the crossbar unit. The tiling units are configured to receive primitives from the world-space pipelines, generate cache tile batches based on the primitives, and transmit the primitives to the screen-space pipelines. One advantage of the disclosed approach is that primitives are processed in application-programming-interface order in a highly parallel tiling architecture. Another advantage is that primitives are processed in cache tile order, which reduces memory bandwidth consumption and improves cache memory utilization.
Abstract:
One embodiment of the present invention sets forth a graphics subsystem configured to implement distributed tiled caching. The graphics subsystem includes one or more world-space pipelines, one or more screen-space pipelines, one or more tiling units, and a crossbar unit. Each world-space pipeline is implemented in a different processing entity and is coupled to a different tiling unit. Each screen-space pipeline is implemented in a different processing entity and is coupled to the crossbar unit. The tiling units are configured to receive primitives from the world-space pipelines, generate cache tile batches based on the primitives, and transmit the primitives to the screen-space pipelines. One advantage of the disclosed approach is that primitives are processed in application-programming-interface order in a highly parallel tiling architecture. Another advantage is that primitives are processed in cache tile order, which reduces memory bandwidth consumption and improves cache memory utilization.