摘要:
A graphics system includes a transposer. A read scheduler utilizes a minimum cost analysis to schedule a read transfer order for the transposer to minimize the total number of passes required to process a set of input vectors.
摘要:
A vertex cache within a graphics processor is configured to operate as a conventional round-robin streaming cache when per-vertex state changes are not used and is configured to operate as a random access storage buffer when per-vertex state changes are used. Batches of vertices that define primitives and state changes are output to parallel processing units for processing according to vertex shader program. In addition to allowing per-vertex state changes, the vertex cache is configured to store vertices for primitive topologies that use anchor points, such as triangle strips, line loops, and polygons.
摘要:
Disclosed are an apparatus, a method, a programmable graphics processing unit (“GPU”), a computer device, and a computer medium to facilitate, among other things, the generation of parallel data streams to effect parallel processing in at least a portion of a graphics pipeline of a GPU. In one embodiment, an input of the apparatus receives graphics elements in a data stream of graphics elements. The graphics pipeline can use the graphics elements to form computer-generated images. The apparatus also can include a transposer configured to produce parallel attribute streams. Each of the parallel attribute streams includes a type of attribute common to the graphics elements. In one embodiment, the transposer can be configured to convert at least a portion of the graphics pipeline from a single data stream to multiple data streams (e.g., executable by multiple threads of execution) while reducing the memory size requirements to implement such a conversion.
摘要:
Disclosed are an apparatus, a system, a method, a graphics processing unit (“GPU”), a computer device, and a computer medium to implement a pool of independent enhanced tags to, among other things, decouple a dependency between tags and cachelines. In one embodiment, an enhanced tag-based cache structure includes a tag repository configured to maintain a pool of enhanced tags. Each enhanced tag can have a match portion configured to form an association between the enhanced tag and an incoming address. Also, an enhanced tag can have a data locator portion configured to locate a cacheline in the cache in response to the formation of the association. The data locator portion enables the enhanced tag to locate multiple cachelines. Advantageously, the enhanced tag-based cache structure can be formed to adjust the degree of reusability of the enhanced tags independent from the degree of latency tolerance for the cacheline repository.
摘要:
A graphics system has parallel processing units that do not share vertex information. The graphics system constructs independent batches of work for the parallel processing units in which each batch of work has a list of vertices for a set of primitives.
摘要:
A graphics processing system is provided. The graphics processing system includes a front end module for receiving pixel data. A setup unit is coupled to the front end module and generates parameter coefficients. A raster unit is coupled to the setup unit and generates stepping information. A virtual texturing array engine textures and colors the pixel data based on the parameter coefficients and stepping information. Also provided is a pixel engine adapted for processing the textured and colored pixel data received from the virtual texturing array engine.
摘要:
One embodiment of the present invention sets forth a crossbar unit that is coupled to a plurality of client subsystems. The crossbar unit is configured to transmit data packets between the client subsystems and includes a high-bandwidth channel and a narrow-bandwidth channel. The high-bandwidth channel is used for transmitting large data packets, while the narrow-bandwidth is used for transmitting smaller data packets. The transmission of data packets may be prioritized based on the source and destination clients as well as the type of data being transmitted. Further, the crossbar unit includes a buffer mechanism for buffering data packets received from source clients until those data packets can be received by the destination clients.
摘要:
The invention sets forth a crossbar unit that includes multiple virtual channels, each virtual channel being a logical flow of data within the crossbar unit. Arbitration logic coupled to source client subsystems is configured to select a virtual channel for transmitting a data request or a data packet to a destination client subsystem based on the type of the source client subsystem and/or the type of data request. Higher priority traffic is transmitted over virtual channels that are configured to transmit data without causing deadlocks and/or stalls. Lower priority traffic is transmitted over virtual channels that can be stalled.
摘要:
The invention sets forth a crossbar unit that includes multiple virtual channels, each virtual channel being a logical flow of data within the crossbar unit. Arbitration logic coupled to source client subsystems is configured to select a virtual channel for transmitting a data request or a data packet to a destination client subsystem based on the type of the source client subsystem and/or the type of data request. Higher priority traffic is transmitted over virtual channels that are configured to transmit data without causing deadlocks and/or stalls. Lower priority traffic is transmitted over virtual channels that can be stalled.