摘要:
A 3D rendering texture caching scheme that minimizes external bandwidth requirements for texture and increases the rate at which textured pixels are available. The texture caching scheme efficiently pre-fetches data at the main memory access granularity and stores it in cache memory. The data in the main memory and texture cache memory is organized in a manner to achieve large reuse of texels with a minimum of cache memory to minimize cache misses. The texture main memory stores a two dimensional array of texels, each texel having an address and one of N identifiers. The texture cache memory has addresses partitioned into N banks, each bank containing texels transferred from the main memory that have the corresponding identifier. A cache controller determines which texels need to be transferred from the texture main memory to the texture cache memory and which texels are currently in the cache using a least most recently used algorithm. By labeling the texture map blocks (double quad words), a partitioning scheme is developed which allow the cache controller structure to be very modular and easily realized. The texture cache arbiter is used for scheduling and controlling the actual transfer of texels from the texture main memory into the texture cache memory and controlling the outputting of texels for each pixel to an interpolating filter from the cache memory.
摘要:
A rasterizer comprised of a bounding box calculator, a plane converter, a windower, and incrementers. For each polygon to be processed, a bounding box calculation is performed which determines the display screen area, in spans, that totally encloses the polygon and passes the data to the plane converter. The plane converter also receives as input attribute values for each vertex of the polygon. The plane converter computes planar coefficients for each attribute of the polygon, for each of the edges of the polygon. The plane converter unit computes the start pixel center location at a start span and a starting coefficient value at that pixel center. The computed coefficients also include the rate of change or gradient, for each polygon attribute in the x and y directions, respectively. The plane converter also computes line coefficients for each of the edges of the polygon. Line equation values are passed through to the windower where further calculations allow the windower to determine which spans are either covered or intersected by the polygon. The incrementers receive the span coverage data from the windower in addition to receiving planar coefficient values from the plane converter. The incrementers utilize the data from both the windower and plane converter to walk or traverse the polygon in those intersected spans, pixel by pixel. As the incrementer visits each pixel, vertex attribute values are interpolated to each pixel.
摘要:
Method and apparatus for rendering texture to an object to be displayed on a pixel screen display. This technique makes use of linear interpolation between perspectively correct texture address to calculate rates of change of individual texture addresses components to determine a selection of the correct LOD map to use and intermediate texture addresses for pixels of the object between the perspectively correct addresses. The method first determines perspectively correct texture address values associated with four corners of a predefined span or grid of pixels. Then, a linear interpolation technique is implemented to calculate a rate of change of texture address components in the screen x and y directions for pixels between the perspectively bound span corners. This linear interpolation technique is performed in both screen directions to thereby create a potentially unique level of detail value for each pixel, which is then used as an index to select the correct pre-filtered LOD texture map. When mapping an individually determined LOD value per pixel, the effect of producing undesirable artifacts that may appear if a single LOD for an entire span or polygon is used, is obviated.
摘要:
Method and apparatus for rendering texture to an object to be displayed on a pixel screen display. This technique makes use of linear interpolation between perspectively correct texture address to calculate rates of change of individual texture addresses components to determine a selection of the correct LOD map to use and intermediate texture addresses for pixels of the object between the perspectively correct addresses. The method first determines perspectively correct texture address values associated with four corners of a predefined span or grid of pixels. Then, a linear interpolation technique is implemented to calculate a rate of change of texture address components in the screen x and y directions for pixels between the perspectively bound span corners. This linear interpolation technique is performed in both screen directions to thereby create a potentially unique level of detail value for each pixel, which is then used as an index to select the correct pre-filtered LOD texture map. When mapping an individually determined LOD value per pixel, the effect of producing undesirable artifacts that may appear if a single LOD for an entire span or polygon is used, is obviated.
摘要:
A computationally efficient method for minimizing the visible effects of texture LOD transitions across a polygon. The minimization is accomplished by adding a dithering offset value to the LOD value computed for each pixel covered by a graphics primitive to produce a dithered pixel LOD value. The dithering offsets mat be generated from a table look-up based on the location of the pixel within a span of pixels. The dithered pixel LOD value is used to as an index in the selection of a single LOD texture map from which a textured pixel value is retrieved. The range of dithering offset values can be adjusted by modulating the values in the table look-up.
摘要:
An apparatus with circuit redundancy includes a set of parallel arithmetic logic units (ALUs), a redundant parallel ALU, input data shifting logic that is coupled to the set of parallel ALUs and that is operatively coupled to the redundant parallel ALU. The input data shifting logic shifts input data for a defective ALU, in a first direction, to a neighboring ALU in the set. When the neighboring ALU is the last or end ALU in the set, the shifting logic continues to shift the input data for the end ALU that is not defective, to the redundant parallel ALU. The redundant parallel ALU then operates for the defective ALU. Output data shifting logic is coupled to an output of the parallel redundant ALU and all other ALU outputs to shift the output data in a second and opposite direction than the input shifting logic, to realign output of data for continued processing, including for storage or for further processing by other circuitry.
摘要:
In accordance with the present invention, the rate of change of texture addresses when mapped to individual pixels of a polygon is used to obtain the correct level of detail (LOD) map from a set of prefiltered maps. The method comprises a first determination of perspectively correct texture address values found at four corners of a predefined span or grid of pixels. Then, a linear interpolation technique is implemented to calculate a rate of change of texture addresses for pixels between the perspectively bound span corners. This linear interpolation technique is performed in both screen directions to thereby create a level of detail value for each pixel. The YUV formats described above have Y components for every pixel sample, and UN (they are also named Cr and Cb) components for every fourth sample. Every UN sample coincides with four (2×2) Y samples. This is identical to the organization of texels in U.S. Pat. No. 4,965,745 “YIQ-Based Color Cell Texturing”, incorporated herein by reference. The improvement of this algorithm is that a single 32-bit word contains four packed Y values, one value each for U and V, and optionally four one-bit Alpha components: YUV_0566: 5-bits each of four Y values, 6-bits each for U and V YUV_1544: 5-bits each of four Y values, 4-bits each for U and V, four 1-bit Alphas These components are converted from 4-, 5-, or 6-bit values to 8-bit values by the concept of color promotion. The reconstructed texels consist of Y components for every texel, and UN components repeated for every block of 2×2 texels. The combination of the YIQ-Based Color Cell Texturing concept, the packing of components into convenient 32-bit words, and color promoting the components to 8-bit values yields a compression from 96 bits down to 32 bits, or 3:1. There is a similarity between the trilinear filtering equation (performing bilinear filtering of four samples at each of two LODs, then linearly filtering those two results) and the motion compensation filtering equation (performing bilinear filtering of four samples from each of a “previous picture” and a “future picture”, then averaging those two results). Thus some of the texture filtering hardware can do double duty and perform the motion compensation filtering when those primitives are sent through the pipeline. The palette RAM area is conveniently used to store correction data (used to “correct” the predicted images that fall between the “I” images in an MPEG data stream) since, during motion compensation the texture palette memory would otherwise be unused.
摘要:
An apparatus with circuit redundancy includes a set of parallel arithmetic logic units (ALUs), a redundant parallel ALU, input data shifting logic that is coupled to the set of parallel ALUs and that is operatively coupled to the redundant parallel ALU. The input data shifting logic shifts input data for a defective ALU, in a first direction, to a neighboring ALU in the set. When the neighboring ALU is the last or end ALU in the set, the shifting logic continues to shift the input data for the end ALU that is not defective, to the redundant parallel ALU. The redundant parallel ALU then operates for the defective ALU. Output data shifting logic is coupled to an output of the parallel redundant ALU and all other ALU outputs to shift the output data in a second and opposite direction than the input shifting logic, to realign output of data for continued processing, including for storage or for further processing by other circuitry.
摘要:
A computation module and/or geometric engine for use in a video graphics processing circuit includes memory, a computation engine, a plurality of thread controllers, and an arbitration module. The computation engine is operably coupled to perform an operation based on an operation code and to provide a corresponding result to the memory as indicated by the operation code. Each of the plurality of thread controllers manages at least one corresponding thread of a plurality of threads. The plurality of threads constitutes an application. The arbitration module is coupled to the plurality of thread controllers and utilizes an application specific prioritization scheme to provide operation codes from the plurality of thread controllers to the computation engine such that idle time of the computation engine is minimized. The prioritization scheme prioritizes certain threads over other threads such that the throughput through the computation module is maximized.
摘要:
An apparatus with circuit redundancy includes a set of parallel arithmetic logic units (ALUs), a redundant parallel ALU, input data shifting logic that is coupled to the set of parallel ALUs and that is operatively coupled to the redundant parallel ALU. The input data shifting logic shifts input data for a defective ALU, in a first direction, to a neighboring ALU in the set. When the neighboring ALU is the last or end ALU in the set, the shifting logic continues to shift the input data for the end ALU that is not defective, to the redundant parallel ALU. The redundant parallel ALU then operates for the defective ALU. Output data shifting logic is coupled to an output of the parallel redundant ALU and all other ALU outputs to shift the output data in a second and opposite direction than the input shifting logic, to realign output of data for continued processing, including for storage or for further processing by other circuitry.