摘要:
A system and method for performing zero-bandwidth-clears reduces external memory accesses by a graphics processor when performing clears and subsequent read operations. A set of clear values is stored in the graphics processor. Each portion of a color or z buffer may be configured using a zero-bandwidth-clear command to reference a clear value without writing the external memory. The clear value is provided to a requestor without accessing the external memory when a read access is performed.
摘要:
A system, method, and computer program product are provided for controlling a shader to gather statistics. In use, instructions are received utilizing a programmable interface. A shader is then controlled to gather statistics based on the instructions. Such statistics are further output to memory utilizing the shader.
摘要:
A method and system for improving data coherency in a parallel rendering system is disclosed. Specifically, one embodiment of the present invention sets forth a method for managing a plurality of independently processed texture streams in a parallel rendering system that includes the steps of maintaining a time stamp for a group of tiles of work that are associated with each of the plurality of the texture streams and are associated with a specified area in screen space, and utilizing the time stamps to counter divergences in the independent processing of the plurality of texture streams.
摘要:
Systems and methods for using multiple versions of programmable constants within a multi-threaded processor allow a programmable constant to be changed before a program using the constants has completed execution. Processing performance may be improved since programs using different values for a programmable constant may execute simultaneously. The programmable constants are stored in a constant buffer and an entry of a constant buffer table is bound to the constant buffer. When a programmable constant is changed it is copied to an entry in a page pool and address translation for the page pool is updated to correspond to the old version (copy) of the programmable constant. An advantage is that the constant buffer stores the newest version of the programmable constant.
摘要:
A method and system for improving data coherency in a parallel rendering system is disclosed. Specifically, one embodiment of the present invention sets forth a method, which includes the steps of receiving a common input stream, tracking a periodic event associated with the common input stream, generating a plurality of fragment streams from the common input stream, inserting a marker based on an occurrence of the periodic event in a first fragment stream in the multiple fragment streams, and utilizing the marker to influence the processing of the first fragment stream so that a plurality of raster operation (ROP) request streams maintains substantially the same coherence as the common input stream. Each fragment stream is independently processed and corresponds to one of the ROP request streams.
摘要:
Methods and systems for reusing memory addresses in a graphics system are disclosed, so that instances of address translation hardware can be reduced. One embodiment of the present invention sets forth a method, which includes mapping a footprint on a display screen to a group of contiguous physical memory locations in a memory system, determining an anchor physical memory address from a first transaction associated with the footprint, wherein the anchor physical memory address corresponds to an anchor in the group of contiguous physical memory locations, determining a second transaction that is also associated with the footprint, determining a set of least significant bits (LSBs) associated with the second transaction, and combining the anchor physical memory address with the set of LSBs associated with the second transaction to generate a second physical memory address for the second transaction, thereby avoiding a second full address translation.
摘要:
Embodiments of the present invention set forth systems and methods for compressing thread group data written to frame buffer memory to increase overall memory performance. A compression/decompression engine within the frame buffer memory interface includes logic configured to identify situations where the threads of a thread group are writing similar scalar values to memory. Upon recognizing such a situation, the engine is configured to compress the scalar data into a form that allows all of the scalar data to be written to or read from the frame buffer memory in fewer clock cycles than would be required to transmit the data in uncompressed form to or from memory. Consequently, the disclosed systems and methods are able to effectively increase memory performance when executing thread group STORE and LOAD operations.
摘要:
Embodiments of methods, apparatuses, devices, and/or systems for load balancing two processors, such as for graphics and/or video processing, for example, are described.
摘要:
A graphics processing subsystem is programmed with shader programs that make calls to an abstract interface. One or more subshaders implementing the functions of the abstract interface can also be defined. The binding of interfaces to functions is resolved by a language runtime module that compiles the subshaders. As shader programs are compiled, the runtime module determines whether each method call is associated with an interface function. For each interface method call, the runtime module determines the appropriate implementation of the interface to be bound to the method call. Once the appropriate implementation is identified, the interface binding is created using string substitution or indirect addressing instructions. At the time of compilation, which may be during the execution of the rendering application, the desired combinations of subshaders are specified and compiled into a combined shader program, which can then be executed by the graphics processing subsystem.
摘要:
Method and apparatus for processing one or more fragment data. In one embodiment, the method includes processing one or more fragment data to generate one or more texture map addresses for one or more texels, determining relevance information that correspond to the texture map addresses, and translating the relevance information into a rendering constraint data structure.