摘要:
An apparatus and method are described for culling commands in a tile-based renderer. For example, one embodiment of an apparatus comprises: a command buffer to store a plurality of commands to be executed by a render pipeline to render a plurality of tiles; visibility analysis circuitry to determine per-tile visibility information for each of the plurality of tiles and to store the visibility information for a first tile in a first storage, the visibility information specifying either that all of the commands associated with rendering the first tile can be skipped or identifying individual commands associated with rendering the first tile that can be skipped; and a render pipeline to read the visibility information from the first storage to determine whether to execute or skip one or more of the commands from the command buffer to render the first tile.
摘要:
For a given texture address, a multi-mode texture sampler fetches and reduces texture data with a multi-mode filter accumulator suitable for providing a weighted average over a variety of filter footprints. A multi-mode texture sampler is configurable to provide both a wide variety of footprints and allow for a filter footprint significantly wider than the bi-linear (2×2 texel) footprint. In embodiments, filter coefficients specifying a weighting for each texel in a flexible footprint are cached from coefficient tables stored in memory. Techniques and systems are provided for dynamic allocation, update and handling of weighting coefficient tables as resources independent of sampler state.
摘要:
For a given texture address, a multi-mode texture sampler fetches and reduces texture data with a multi-mode filter accumulator suitable for providing a weighted average over a variety of filter footprints. A multi-mode texture sampler is configurable to provide both a wide variety of footprints and allow for a filter footprint significantly wider than the bi-linear (2×2 texel) footprint. In embodiments, filter coefficients specifying a weighting for each texel in a flexible footprint are cached from coefficient tables stored in memory. Techniques and systems are provided for dynamic allocation, update and handling of weighting coefficient tables as resources independent of sampler state.
摘要:
Position-based rendering apparatus and method for multi-die/GPU graphics processing. For example, one embodiment of a method comprises: distributing a plurality of graphics draws to a plurality of graphics processors; performing position-only shading using vertex data associated with tiles of a first draw on a first graphics processor, the first graphics processor responsively generating visibility data for each of the tiles; distributing subsets of the visibility data associated with different subsets of the tiles to different graphics processors; limiting geometry work to be performed on each tile by each graphics processor using the visibility data, each graphics processor to responsively generate rendered tiles; and wherein the rendered tiles are combined to generate a complete image frame.
摘要:
An embodiment of a conditional shader apparatus may include a conditional pixel shader to determine if one or more pixels meet a shader condition, and a pixel regrouper communicatively coupled to the conditional pixel shader to regroup pixels based on whether the one or more pixels are determined to meet the shader condition. Another embodiment of a conditional shader apparatus may include a thread analyzer to determine if a set of threads meet a thread condition, and a conditional kernel loader communicatively coupled to the thread analyzer to load an appropriate kernel from a set of two or more kernels based on whether the set of threads are determined to meet the thread condition. Other embodiments are disclosed and claimed.
摘要:
Various embodiments are presented herein that may reduce the workload of a GPU tasked with delivering frames of video data to a display generated by a 3D application executing within a system or computing platform. 3D applications executing within the system may generate new frames of video content at a specified frame rate known as frames per second (FPS). These frames are then delivered to a display communicatively coupled with the system for rendering. Every display has a refresh rate specified in cycles per second or Hertz (Hz). Vertical Synchronization (VSYNC) is a setting that synchronizes the frames per second (FPS) of a given application with the display's refresh rate. Forcing VSYNC on the application while the system is operating on battery power may reduce the workload on the GPU when the FPS is greater than the refresh rate resulting in greater battery life.
摘要:
Various embodiments are presented herein that may allow an application direct access to graphical processing unit memory. An apparatus and a computer-implemented method may include accessing allocated graphical processing unit memory of a second resource via a link from a first resource. The allocated graphical processing unit memory may be mapped into one or more page tables of a central processing unit. A virtual address of the graphical processing unit memory from the one or more page tables of the central processing unit may be sent to the application.
摘要:
By scheduling/managing workload submission to a POSH pipe one can exploit parallelism with minimum impact to the software scheduler in some embodiments.
摘要:
Various embodiments are presented herein that may reduce the workload of a GPU tasked with delivering frames of video data to a display generated by a 3D application executing within a system or computing platform. 3D applications executing within the system may generate new frames of video content at a specified frame rate known as frames per second (FPS). These frames are then delivered to a display communicatively coupled with the system for rendering. Every display has a refresh rate specified in cycles per second or Hertz (Hz). Vertical Synchronization (VSYNC) is a setting that synchronizes the frames per second (FPS) of a given application with the display's refresh rate. Forcing VSYNC on the application while the system is operating on battery power may reduce the workload on the GPU when the FPS is greater than the refresh rate resulting in greater battery life.
摘要:
Various embodiments enable loop processing in a command processing block of the graphics hardware. Such hardware may include a processor including a command buffer, and a graphics command parser. The graphics command parser to load graphics commands from the command buffer, parse a first graphics command, store a loop count value associated with the first graphics command, parse a second graphics command and store a loop wrap address based on the second graphics command. The graphics command parser may execute a command sequence identified by the second graphics command, parse a third graphics command, the third graphics command identifying an end of the command sequence, set a new loop count value, and iteratively execute the command sequence using the loop wrap address based on the new loop count value.