摘要:
A method and system for improving data coherency in a parallel rendering system is disclosed. Specifically, one embodiment of the present invention sets forth a method for managing a plurality of independently processed texture streams in a parallel rendering system that includes the steps of maintaining a time stamp for a group of tiles of work that are associated with each of the plurality of the texture streams and are associated with a specified area in screen space, and utilizing the time stamps to counter divergences in the independent processing of the plurality of texture streams.
摘要:
A method and system for improving data coherency in a parallel rendering system is disclosed. Specifically, one embodiment of the present invention sets forth a method for managing a plurality of independently processed texture streams in a parallel rendering system that includes the steps of maintaining a time stamp for a group of tiles of work that are associated with each of the plurality of the texture streams and are associated with a specified area in screen space, and utilizing the time stamps to counter divergences in the independent processing of the plurality of texture streams.
摘要:
A method and system for improving data coherency in a parallel rendering system is disclosed. Specifically, one embodiment of the present invention sets forth a method for managing a plurality of independently processed texture streams in a parallel rendering system that includes the steps of maintaining a time stamp for a group of tiles of work that are associated with each of the plurality of the texture streams and are associated with a specified area in screen space, and utilizing the time stamps to counter divergences in the independent processing of the plurality of texture streams.
摘要:
A method and system for improving data coherency in a parallel rendering system is disclosed. Specifically, one embodiment of the present invention sets forth a method, which includes the steps of receiving a common input stream, tracking a periodic event associated with the common input stream, generating a plurality of fragment streams from the common input stream, inserting a marker based on an occurrence of the periodic event in a first fragment stream in the multiple fragment streams, and utilizing the marker to influence the processing of the first fragment stream so that a plurality of raster operation (ROP) request streams maintains substantially the same coherence as the common input stream. Each fragment stream is independently processed and corresponds to one of the ROP request streams.
摘要:
A multi-mode texture compression algorithm is provided for effective compression and decompression texture data during graphics processing. Initially, a request is sent to memory for compressed texture data. Such compressed texture data is then received from the memory in response to the request. At least one of a plurality of compression algorithms associated with the compressed texture data is subsequently identified. Thereafter, the compressed texture data is decompressed in accordance with the identified compression algorithm.
摘要:
A system and method for providing antialiased memory access includes receiving a request to access a memory address. The memory address is examined to determine if the memory address is within a virtual frame buffer. If the memory address is within a virtual frame buffer then the memory address is transformed into one or more physical addresses within a frame buffer that is utilized for antialiasing. The frame buffer may be a single memory space containing subpixel information corresponding to pixels of the virtual frame buffer. Subpixels located at the physical addresses within the frame buffer are then accessed. The disclosed invention provides for direct access by a software application.
摘要:
A scalable shader architecture is disclosed. In accord with that architecture, a shader includes multiple shader pipelines, each of which can perform processing operations on rasterized pixel data. Shader pipelines can be functionally removed as required, thus preventing a defective shader pipeline from causing a chip rejection. The shader includes a shader distributor that processes rasterized pixel data and then selectively distributes the processed rasterized pixel data to the various shader pipelines, beneficially in a manner that balances workloads. A shader collector formats the outputs of the various shader pipelines into proper order to form shaded pixel data. A shader instruction processor (scheduler) programs the individual shader pipelines to perform their intended tasks. Each shader pipeline has a shader gatekeeper that interacts with the shader distributor and with the shader instruction processor such that pixel data that passes through the shader pipelines is controlled and processed as required.
摘要:
A scalable shader architecture is disclosed. In accord with that architecture, a shader includes multiple shader pipelines, each of which can perform processing operations on rasterized pixel data. Shader pipelines can be functionally removed as required, thus preventing a defective shader pipeline from causing a chip rejection. The shader includes a shader distributor that processes rasterized pixel data and then selectively distributes the processed rasterized pixel data to the various shader pipelines, beneficially in a manner that balances workloads. A shader collector formats the outputs of the various shader pipelines into proper order to form shaded pixel data. A shader instruction processor (scheduler) programs the individual shader pipelines to perform their intended tasks. Each shader pipeline has a shader gatekeeper that interacts with the shader distributor and with the shader instruction processor such that pixel data that passes through the shader pipelines is controlled and processed as required.
摘要:
An apparatus and method for enlarging a source image on a computer display is described. A scanline buffer stores pixels in a first row of the source image. Coordination circuitry receives from a processor and stores pixels in a second row of the source image. Blend circuitry generates output pixels, such that each of the output pixels corresponds to a weighted average of a plurality of input pixels. The plurality of input pixels comprises two pixels stored in the scanline buffer and two pixels stored in the coordination circuitry. Control circuitry determines weight values and provides the weight values to the blend circuitry for calculating the weighted average. The control circuitry also determines, based on the weight values, which pixels stored in the scanline buffer and in the coordination circuitry are used by the blend circuitry as the input pixels. In addition, the control circuitry replaces the pixels in the scanline buffer with corresponding pixels from the coordination circuitry after the last time the pixels in the scanline buffer are used as the input pixels.
摘要:
A processor buffers asynchronous threads. Instructions requiring operations provided by a plurality of execution units are divided into phases, each phase having at least one computation operation and at least one memory access operation. Instructions within each phase are qualified and prioritized. The instructions may be qualified based on the status of the execution unit needed to execute one or more of the current instructions. The instructions may also be qualified based on an age of each instruction, status of the execution units, a divergence potential, locality, thread diversity, and resource requirements. Qualified instructions may be prioritized based on execution units needed to execute instructions and the execution units in use. One or more of the prioritized instructions is issued per cycle to the plurality of execution units.