摘要:
This disclosure describes a multi-stage tessellation technique for tessellating a curve during graphics rendering. In particular, a first tessellation stage tessellates the curve into a first set of line segments that each represents a portion of the curve. A second tessellation stage further tessellates the portion of the curve represented by each of the line segments of the first set into additional line segments that more finely represent the shape of the curve. In this manner, each portion of the curve that was represented by only one line segment after the first tessellation stage is represented by more than one line segment after the second tessellation stage. In some instances, more than two tessellation stages may be performed to tessellate the curve.
摘要:
This disclosure describes a multi-stage tessellation technique for tessellating a curve during graphics rendering. In particular, a first tessellation stage tessellates the curve into a first set of line segments that each represents a portion of the curve. A second tessellation stage further tessellates the portion of the curve represented by each of the line segments of the first set into additional line segments that more finely represent the shape of the curve. In this manner, each portion of the curve that was represented by only one line segment after the first tessellation stage is represented by more than one line segment after the second tessellation stage. In some instances, more than two tessellation stages may be performed to tessellate the curve.
摘要:
The disclosure relates to techniques for locking and unlocking cache lines in a cache included within a multi-media processor that performs read-modify-write functions using batch read and write requests for data stored in either an external memory or an embedded memory. The techniques may comprise receiving a read request in a batch of read requests for data included in a section of a cache line and setting a lock bit associated with the section in response to the read request. When the lock bit is set, additional read requests in the batch of read requests are unable to access data in that section of the cache line. The lock bit may be unset in response to a write request in a batch of write requests to update the data previously read out from that section of the cache line.
摘要:
The disclosure relates to techniques for locking and unlocking cache lines in a cache included within a multi-media processor that performs read-modify-write functions using batch read and write requests for data stored in either an external memory or an embedded memory. The techniques may comprise receiving a read request in a batch of read requests for data included in a section of a cache line and setting a lock bit associated with the section in response to the read request. When the lock bit is set, additional read requests in the batch of read requests are unable to access data in that section of the cache line. The lock bit may be unset in response to a write request in a batch of write requests to update the data previously read out from that section of the cache line.
摘要:
Techniques are described for reordering commands to improve the speed at which at least one command stream may execute. Prior to distributing commands in the at least one command stream to multiple pipelines, a multimedia processor analyzes any inter-pipeline dependencies and determines the current execution state of the pipelines. The processor may, based on this information, reorder the at least one command stream by prioritizing commands that lack any current dependencies and therefore may be executed immediately by the appropriate pipeline. Such out of order execution of commands in the at least one command stream may increase the throughput of the multimedia processor by increasing the rate at which the command stream is executed.
摘要:
A multi-threaded processor is provided that internally reorders output threads thereby avoiding the need for an external output reorder buffer. The multi-threaded processor writes its thread results back to an internal memory buffer to guarantee that thread results are outputted in the same order in which the threads are received. A thread scheduler within the multi-threaded processor manages thread ordering control to avoid the need for an external reorder buffer. A compiler for the multi-threaded processor converts instructions that would normally send processed results directly to an external reorder buffer so that the processed thread results are instead sent to the internal memory buffer of the multi-threaded processor.
摘要:
A multi-threaded processor is provided, such as a shader processor, having an internal unified memory space that is shared by a plurality of threads and is dynamically assigned to threads as needed. A mapping table that maps virtual registers to available internal addresses in the unified memory space so that thread registers can be stored in contiguous or non-contiguous memory addresses. Dynamic sizing of the virtual registers allows flexible allocation of the unified memory space depending on the type and size of data in a thread register. Yet another feature provides an efficient method for storing graphics data in the unified memory space to improve fetch and store operations from the memory space. In particular, pixel data for four pixels in a thread are stored across four memory devices having independent input/output ports that permit the four pixels to be read in a single clock cycle for processing.
摘要:
The present disclosure includes a shader compiler system and method. In an embodiment, a shader compiler includes a decoder to translate an instruction having a vector representation to a unified instruction representation. The shader compiler also includes an encoder to translate an instruction having a unified instruction representation to a processor executable instruction.
摘要:
A graphics processor capable of parallel scheduling and execution of multiple threads, and techniques for achieving parallel scheduling and execution, are described. The graphics processor may include multiple hardware units and a scheduler. The hardware units are operable in parallel, with each hardware unit supporting a respective set of operations. The hardware units may include an ALU core, an elementary function core, a logic core, a texture sampler, a load control unit, some other hardware unit, or a combination thereof. The scheduler dispatches instructions for multiple threads to the hardware units concurrently. The graphics processor may further include an instruction cache to store instructions for threads and register banks to store data. The instruction cache and register banks may be shared by the hardware units.
摘要:
This disclosure describes techniques of loading batch commands into a graphics processing unit (GPU). As described herein, a GPU driver for the GPU identifies one or more graphics processing objects to be used by the GPU in order to render a batch of graphics primitives. The GPU driver may insert indexes associated with the identified graphics processing objects into a batch command. The GPU driver may then issue the batch command to the GPU. The GPU may use the indexes in the batch command to retrieve the graphics processing objects from memory. After retrieving the graphics processing objects from memory, the GPU may use the graphics processing objects to render the batch of graphics primitives.