摘要:
A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a thread dispatcher to assign a priority class to each of a plurality of processing threads prior to dispatching the one or more processing threads, a plurality of execution units to process the threads, a shared resource coupled to each of the plurality of execution units and an arbitration unit to grant access to the shared resource to a first of the plurality of execution units based on the priority class of a thread being executed at the first execution unit.
摘要:
In position-only shading, two geometry pipes exist, a trimmed down version called the Cull Pipe and a full version called the Replay Pipe. Thus, the Cull Pipe executes the position shaders in parallel with the main application, but typically generates the critical results much faster as it fetches and shades only the position attribute of the vertices and avoids the rasterization as well as the rendering of pixels for the frame buffer. Furthermore, the Cull Pipe uses these critical results to compute visibility information for all the triangles whether they are culled or not. On the other hand, the Replay Pipe consumes the visibility information to skip the culled triangles and shades only the visible triangles that are finally passed to the rasterization phase. Together the two pipes can hide the long cull runs of discarded triangles and can complete the work faster in some embodiments.
摘要:
In one embodiment, efficiency of a pixel merge unit of a graphics pipeline is increased by identifying a silhouette edge of an input primitive and bypassing the pixel merge unit for fragments associated with the silhouette edge. Identifying partially covered fragments along the silhouette edge and preventing those fragments from entering the pixel merge unit allows existing fragments within the pixel merge unit to reside within the pixel merge unit for a longer period before getting evicted. The additional residency grants fragments additional time to wait for neighboring fragments to arrive, which, in turn, increases the merge rate for fragments that are eligible to be merged.
摘要:
A per-tile test in the 5D rasterizer outputs intervals for both lens parameters, (u,v), and for time, t, as well as for depth z. These intervals are conservative bounds for the current tile for 1) the visible lens region, 2) the time the triangle overlaps the tile, and 3) the depth range for the triangle inside the tile.
摘要:
Techniques related to graphics rendering including techniques for compression and/or decompression of graphics data by use of pixel region bit values are described.
摘要:
Techniques are described that can delay or even prevent use of memory to store triangles associated with tiles as well as processing resources associated with vertex shading and binning triangles. The techniques can also provide better load balancing among a set of cores, and hence provide better performance. A bounding volume is generated to represent a geometry group. Culling takes place to determine whether a geometry group is to have triangles rendered. Vertex shading and association of triangles with tiles can be performed across multiple cores in parallel. Processing resources are allocated for rasterizing tiles whose triangles have been vertex shaded and binned over tiles whose triangles have yet to be vertex shaded and binned. Rasterization of triangles of different tiles can be performed by multiple cores in parallel.
摘要:
Techniques related to graphics rendering including techniques for improved multi-sampling anti-aliasing compression by use of unreachable bit combinations as described.
摘要:
First, the colors are partitioned within a tile into distinct groups, such that the variation of color within each group is lowered. Second, each group can be encoded in an efficient manner. The algorithm described herein may give a higher compression ratio than previous algorithms, and therefore may further reduce memory bandwidth at a very low increase in computational cost in some embodiments. The algorithm may be added to a system with existing buffer compression algorithms, handling additional tiles that the existing algorithm fails to compress, thereby increasing the overall compression rate.
摘要:
Depth of field may be rasterized by culling half-space regions on a lens from which a triangle to be rendered is not visible. Then, inside tests are only performed on the remaining unculled half-space regions. Separating planes between the triangle to be rendered and the tile being processed can be used to define the half-space regions.
摘要:
In accordance with some embodiments, caching may be improved for tiles on shared edges between triangles. In some embodiments, the technique may be used for either color and depth caches or both caches.