摘要:
One embodiment of the present invention sets forth a compression status cache configured to store compression information for blocks of memory stored within an external memory. A data cache unit is configured to request, in response to a cache miss, compressed data from the external memory based on compression information stored in the compression status bit cache. The compression status for active buffers is dynamically swapped into the compression status cache as needed. Different compression formats may be specified for one or more tiles within an active buffer. One advantage of the disclosed compression status cache is that a lame amount of attached memory may be allocated as compressible memory blocks, without incurring a corresponding die area cost because a portion of the compression status stored off chip in attached memory is cached in the compression status cache.
摘要:
One embodiment of the present invention sets forth a technique for performing a memory access request to compressed data within a virtually mapped memory system comprising an arbitrary number of partitions. A virtual address is mapped to a linear physical address, specified by a page table entry (PTE). The PTE is configured to store compression attributes, which are used to locate compression status for a corresponding physical memory page within a compression status bit cache. The compression status bit cache operates in conjunction with a compression status bit backing store. If compression status is available from the compression status bit cache, then the memory access request proceeds using the compression status. If the compression status bit cache misses, then the miss triggers a fill operation from the backing store. After the fill completes, memory access proceeds using the newly filled compression status information.
摘要:
A system and method for dynamically adjusting the pixel sampling rate during primitive shading can improve image quality or increase shading performance. Hybrid antialiasing is performed by selecting a number of shaded samples per pixel fragment. A combination of supersample and multisample antialiasing is used where a cluster of sub-pixel samples (multisamples) is processed for each pass through a fragment shader pipeline. The number of shader passes and multisamples in each cluster can be determined dynamically for each primitive based on rendering state.
摘要:
A system and method for dynamically adjusting the pixel sampling rate during primitive shading can improve image quality or increase shading performance. Hybrid antialiasing is performed by selecting a number of shaded samples per pixel fragment. A combination of supersample and multisample antialiasing is used where a cluster of sub-pixel samples (multisamples) is processed for each pass through a fragment shader pipeline. The number of shader passes and multisamples in each cluster can be determined dynamically for each primitive based on rendering state.
摘要:
A system and method for performing zero-bandwidth-clears reduces external memory accesses by a graphics processor when performing clears and subsequent read operations. A set of clear values is stored in the graphics processor. Each portion of a color or z buffer may be configured using a zero-bandwidth-clear command to reference a clear value without writing the external memory. The clear value is provided to a requestor without accessing the external memory when a read access is performed.
摘要:
A method and system for improving data coherency in a parallel rendering system is disclosed. Specifically, one embodiment of the present invention sets forth a method for managing a plurality of independently processed texture streams in a parallel rendering system that includes the steps of maintaining a time stamp for a group of tiles of work that are associated with each of the plurality of the texture streams and are associated with a specified area in screen space, and utilizing the time stamps to counter divergences in the independent processing of the plurality of texture streams.
摘要:
Multisampling techniques provide temporal as well as spatial antialiasing. Coverage for a primitive is determined at multiple sample locations for a pixel. In one embodiment, coverage is determined using boundary equations representing a boundary surface of the primitive in a three-dimensional space-time. A shading value for the primitive is computed for the pixel and stored for each coverage sample location of the pixel that is covered by the primitive. The sample locations are distributed in both space and time, and multiple sample locations share a single shading computation. The multisampling techniques are extendable to other dimensions that correspond to other image attributes.
摘要:
Multisampling techniques provide temporal as well as spatial antialiasing. Coverage for a primitive is be determined at multiple sample locations for a pixel. In one embodiment, coverage is determined using boundary equations representing a boundary surface of the primitive in a three-dimensional space-time. A shading value for the primitive is computed for the pixel and stored for each coverage sample location of the pixel that is covered by the primitive. The sample locations are distributed in both space and time, and multiple sample locations share a single shading computation. The multisampling techniques are extendable to other dimensions that correspond to other image attributes.
摘要:
A large non-patterned noise texture occupies a relatively small physical memory space. Each of a small set of physical pages in physical memory includes noise texels forming part of a noise texture. A large “virtual” noise texture is created by mapping each one of a large number of pages in virtual address space to one of the small set of physical pages; multiple virtual pages may be mapped to the same physical page. The physical page that each virtual page maps to is randomly or pseudo-randomly selected such that the resulting noise texture appears to be non-repeating. When a noise texel is requested by reference to a virtual address during rendering, the virtual address of the virtual page is translated to the corresponding physical address, and the noise texel is retrieved.
摘要:
One embodiment of the present invention sets forth a technique to perform fine-grained rendering predication using an IGPU and a DGPU. A graphics driver divides a 3D object into batches of triangles. The IGPU processes each batch of triangles through a modified rendering pipeline to determine if the batch is culled. The IGPU writes bits into a bitstream corresponding to the visibility of the batches. The DGPU reads bits from the bitstream and performs full-blown rendering, including shading, but only on the batches of triangles whose bit indicates that the batch is visible. Advantageously, this approach to rendering predication provides fine-grained culling without adding unnecessary overhead, thereby optimizing both hardware resources and performance.