摘要:
One embodiment of the present invention sets forth a technique providing an optimized way to allocate and access memory across a plurality of thread/data lanes. Specifically, the device driver receives an instruction targeted to a memory set up as an array of structures of arrays. The device driver computes an address within the memory using information about the number of thread/data lanes and parameters from the instruction itself. The result is a memory allocation and access approach where the device driver properly computes the target address in the memory. Advantageously, processing efficiency is improved where memory in a parallel processing subsystem is internally stored and accessed as an array of structures of arrays, proportional to the SIMT/SIMD group width (the number of threads or lanes per execution group).
摘要:
One embodiment of the present invention sets forth a technique providing an optimized way to allocate and access memory across a plurality of thread/data lanes. Specifically, the device driver receives an instruction targeted to a memory set up as an array of structures of arrays. The device driver computes an address within the memory using information about the number of thread/data lanes and parameters from the instruction itself. The result is a memory allocation and access approach where the device driver properly computes the target address in the memory. Advantageously, processing efficiency is improved where memory in a parallel processing subsystem is internally stored and accessed as an array of structures of arrays, proportional to the SIMT/SIMD group width (the number of threads or lanes per execution group).
摘要:
One embodiment of the present invention sets forth technique for watertight evaluation of Gregory patches for Catmull-Clark subdivision surfaces. Each boundary of each patch within a subdivision surface is configured to be owned by one related patch. In general, a given patch may own specific control points for the patch, while certain other control points for the patch may need to be reconstructed because the control points are owned by an adjacent patch. For a given patch, each control point along to a shared boundary is consistently generated using reconstruction data available to the patch. The reconstruction data is generated from values associated with a patch that owns the shared boundary. Because numerically identical data is used to evaluate each patch at each boundary, the boundaries are watertight. One advantage of the present invention is that watertight evaluation may be achieved using similar computational effort versus conventional non-watertight evaluation techniques.
摘要:
A line rasterization technique in accordance with one embodiment includes conditioning a line by pulling in the ending vertex of the line or pushing out the starting vertex of the line. Thereafter, if the line exits a diamond test area of each pixel that it touches, the pixel may be lit.
摘要:
Heterogeneous processors can cooperate for distributed processing tasks in a multiprocessor computing system. Each processor is operable in a “compatible” mode, in which all processors within a family accept the same baseline command set and produce identical results upon executing any command in the baseline command set. The processors also have a “native” mode of operation in which the command set and/or results may differ in at least some respects from the baseline command set and results. Heterogeneous processors with a compatible mode defined by reference to the same baseline can be used cooperatively for distributed processing by configuring each processor to operate in the compatible mode.
摘要:
A method for providing state inheritance across command lists in a multi-threaded processing environment. The method includes receiving an application program that includes a plurality of parallel threads; generating a command list for each thread of the plurality of parallel threads; causing a first command list associated with a first thread of the plurality of parallel threads to be executed by a processing unit; and causing a second command list associated with a second thread of the plurality of parallel threads to be executed by the processing unit, where the second command list inherits from the first command list state associated with the processing unit.
摘要:
Apparatus, system, and method for clipping graphics primitives are described. In one embodiment, a clipping module includes a mapping unit and a clipping engine that is connected to the mapping unit. The mapping unit is configured to map a graphics primitive onto a canonical representation that is defined with respect to a clipping plane. The clipping engine is configured to clip the graphics primitive with respect to the clipping plane based on the canonical representation.
摘要:
The VPC unit and setup unit of a graphics processing subsystem perform culling operations. The VPC unit performs culling operations on geometric primitives falling within a specific criteria, such as having a property within of a numerical range limit. This limit reduces the complexity of the VPC unit. As increasing rendering complexity typically produces a large number of small primitives, the VPC unit culls many primitives despite its limitations. The VPC unit also includes a cache for storing previously processed vertices in their transformed form, along with previously computed culling information. This increases the VPC unit throughput by reducing the number of memory accesses and culling operations to be performed. The setup unit performs culling operations on any general primitive that cannot be culled by the VPC unit. By performing a first series of culling operations in the VPC unit, the processing burden on the setup unit is decreased.
摘要:
A method for providing state inheritance across command lists in a multi-threaded processing environment. The method includes receiving an application program that includes a plurality of parallel threads; generating a command list for each thread of the plurality of parallel threads; causing a first command list associated with a first thread of the plurality of parallel threads to be executed by a processing unit; and causing a second command list associated with a second thread of the plurality of parallel threads to be executed by the processing unit, where the second command list inherits from the first command list state associated with the processing unit.
摘要:
One embodiment of the present invention sets forth a technique controlling the pixel location at which the plane equation is evaluated. Multiple pixel offsets (dx, dy) may be specified that each define to a sub-pixel sample position. Attributes are then calculated for each sub-pixel sample position that is covered by a geometric primitive. One advantage of the technique is that anti-aliasing quality may be improved since high frequency color components may be selectively supersampled for particular geometric primitives.