摘要:
One embodiment of the present invention sets forth a technique for managing the allocation and release of resources during multi-threaded program execution. Programmable reference counters are initialized to values that limit the amount of resources for allocation to tasks that share the same reference counter. Resource parameters are specified for each task to define the amount of resources allocated for consumption by each array of execution threads that is launched to execute the task. The resource parameters also specify the behavior of the array for acquiring and releasing resources. Finally, during execution of each thread in the array, an exit instruction may be configured to override the release of the resources that were allocated to the array. The resources may then be retained for use by a child task that is generated during execution of a thread.
摘要:
One embodiment of the present invention sets forth a technique for storing only the enabled components for each enabled vector and writing only enabled components to one or more specified render targets. A shader program header (SPH) file provides per-component mask bits for each render target. Each enabled mask bit indicates that the pixel shader generates the corresponding component as an output to the raster operations unit. In the hardware, the per-component mask bits are combined with the applications programming interface (API)-level per-component write masks to determine the components that are updated by the shader program. The combined mask is used as the write enable bits for components in one or more render targets. One advantage of the combined mask is that the components that are not updated are not forwarded from the pixel shader to the ROP, thereby saving bandwidth between those processing units.
摘要:
One embodiment of the present invention sets forth a technique for configuring a graphics processing pipeline (GPP) to process data according to one or more shader programs. The method includes receiving a plurality of pointers, where each pointer references a different shader program header (SPH) included in a plurality of SPHs, and each SPH is associated with a different shader program that executes within the GPP. For each SPH included in the plurality of SPHs, one or more GPP configuration parameters included in the SPH are identified, and the GPP is adjusted based on the one or more GPP configuration parameters.
摘要:
A method and system for overriding state information programmed into a processor using an application programming interface (API) avoids introducing error conditions in the processor. An override monitor unit within the processor stores the programmed state for any setting that is overridden so that the programmed state can be restored when the error condition no longer exists. The override monitor unit overrides the programmed state by forcing the setting to a legal value that does not cause an error condition. The processor is able to continue operating without notifying a device driver that an error condition has occurred since the error condition is avoided.
摘要:
Structure, apparatus, and method for performing conservative hidden surface removal in a graphics processor. Culling is divided into two steps, a magnitude comparison content addressable memory cull operation (MCCAM Cull), and a subpixel cull operation. The MCCAM Cull discards primitives that are hidden completely by previously processed geometry. The Subpixel Cull takes the remaining primitives (which are partly or entirely visible), and determines the visible fragments. In one embodiment the method of performing hidden surface removal includes: selecting a current primitive comprising a plurality of stamps; comparing stamps to stamps from previously evaluated primitives; selecting a first stamp as a currently potentially visible stamp (CPVS) based on a relationship of depth states of samples in the first stamp with depth states of samples of previously evaluated stamps; comparing the CPVS to a second stamp; discarding the second stamp when no part of the second stamp would affect a final graphics display image based on the stamps that have been evaluated; discarding the CPVS and making the second stamp the CPVS, when the second stamp hides the CPVS; dispatching the CPVS and making the second stamp the CPVS when both the second stamp and the CPVS are at least partially visible in the final graphics display image; and dispatching the second stamp and the CPVS when the visibility of the second stamp and the CPVS depends on parameters evaluated later in the computer graphics pipeline.
摘要:
Apparatus and method for a Parallel Query Z-coordinate Buffer are described. The apparatus and method perform a keep/discard decision on screen coordinate geometry before the geometry is converted or rendered into individual display screen pixels by implementing a parallel searching technique within a novel z-coordinate buffer based on a novel magnitude comparison content addressable memory (MCCAM) structure. The MCCAM provides means structure and method for performing simultaneous arithmetic magnitude comparisons on numerical quantities. These arithmetic magnitude comparisons include arithmetic less-than, greater-than, less-than-or-equal to, and greater-than-or-equal-to operations between coordinate values of a selected graphical object and the coordinate values of other objects in the image scene which may or may not occult the selected graphical object. Embodiments of the method and apparatus utilizing variations The structure and method support variations and combinations of bounding box occulting tests, vertex bounding box occulting tests, span occulting tests, and raster-write occulting tests, as well as combinations of these tests are described .
摘要:
A deferred graphics pipeline processor comprised of a mode extraction unit and a Polygon Memory associated with the polygon unit. The mode extraction unit receives a data stream from a geometry unit and separates the data stream into vertices data, and non-vertices data which is sent to the Polygon Memory for storage. A mode injection unit receives inputs from the Polygon Memory and communicates the mode information to one or more other processing units. The mode injection unit maintains status information identifying the information that is already cached and not sending information that is already cached, thereby reducing communication bandwidth.
摘要:
Structure, apparatus, and method for performing conservative hidden surface removal in a graphics processor. Culling is divided into two steps, a magnitude comparison content addressable memory cull operation (MCCAM Cull), and a subpixel cull operation. The MCCAM Cull discards primitives that are hidden completely by previously processed geometry. The Subpixel Cull takes the remaining primitives (which are partly or entirely visible), and determines the visible fragments. In one embodiment the method of performing hidden surface removal includes: selecting a current primitive comprising a plurality of stamps; comparing stamps to stamps from previously evaluated primitives; selecting a first stamp as a currently potentially visible stamp (CPVS) based on a relationship of depth states of samples in the first stamp with depth states of samples of previously evaluated stamps; comparing the CPVS to a second stamp; discarding the second stamp when no part of the second stamp would affect a final graphics display image based on the stamps that have been evaluated; discarding the CPVS and making the second stamp the CPVS, when the second stamp hides the CPVS; dispatching the CPVS and making the second stamp the CPVS when both the second stamp and the CPVS are at least partially visible in the final graphics display image; and dispatching the second stamp and the CPVS when the visibility of the second stamp and the CPVS depends on parameters evaluated later in the computer graphics pipeline.
摘要:
Three-dimensional computer graphics systems and methods and more particularly to structure and method for a three-dimensional graphics processor and having other enhanced graphics processing features. In one embodiment the graphics processor is a Deferred Shading Graphics Processor (DSGP) comprising an AGP interface, a command fetch & decode (2000), a geometry unit (3000), a mode extraction (4000) and polygon memory (5000), a sort unit (6000) and sort memory (7000), a setup unit (8000), a cull unit (9000), a mode injection (10000), a fragment unit (11000), a texture (12000) and texture memory (13000) a phong shading (14000), a pixel unit (15000), a backend unit (1600) coupled to a frame buffer (17000). Other embodiments need not include all of these functional units, and the structures and methods of these units are applicable to other computational processes and systems as well as deferred and non-deferred shading graphical processors.
摘要:
Apparatus and method for detecting unconstrained collisions between three-dimensional moving objects are described. The apparatus and method addresses the problems associated with handling objects with substance passing through each other in three-dimensional space. When objects collide in a three-dimensional simulation, it is important to identify such collisions in real-time so that the behavior of the colliding objects may be adjusted appropriately. Native vertices are stored and novel structure is provided so that the stored words containing native vertices work together to form polygons, or other object primitives, that work together. For triangle object primitives, three vertices form the first triangle primitive, but a second triangle primitive is formed by receiving and storing only one additional vertex, the other two vertices needed to form the second triangle primitive being shared with the first triangle primitive. The apparatus and method also provides structure for storing and communicating polygon vertex relationship information between multiple object primitives and objects, and structure and method for comparing the extent of an object primitive with all other previously stored object primitive extents simultaneous with receipt and storage of the object primitive vertex data. The ability to store each coordinate vertex only once and to share the vertex coordinate information among multiple objects radically reduces the vertex storage requirements, simplifies unconstrained object collision determinations, and increases data throughput so that real-time, or near real-time, computations appropriate for simulation are achieved.