摘要:
Techniques are disclosed for peer-to-peer data transfers where a source device receives a request to read data words from a target device. The source device creates a first and second read command for reading a first portion and a second portion of a plurality of data words from the target device, respectively. The source device transmits the first read command to the target device, and, before a first read operation associated with the first read command is complete, transmits the second read command to the target device. The first and second portions of the plurality of data words are stored in a first and second portion a buffer memory, respectively. Advantageously, an arbitrary number of multiple read operations may be in progress at a given time without using multiple peer-to-peer memory buffers. Performance for large data block transfers is improved without consuming peer-to-peer memory buffers needed by other peer GPUs.
摘要:
One embodiment of the present invention sets forth a technique to perform fine-grained rendering predication using an IGPU and a DGPU. A graphics driver divides a 3D object into batches of triangles. The IGPU processes each batch of triangles through a modified rendering pipeline to determine if the batch is culled. The IGPU writes bits into a bitstream corresponding to the visibility of the batches. The DGPU reads bits from the bitstream and performs full-blown rendering, including shading, but only on the batches of triangles whose bit indicates that the batch is visible. Advantageously, this approach to rendering predication provides fine-grained culling without adding unnecessary overhead, thereby optimizing both hardware resources and performance.
摘要:
One embodiment of the present invention sets forth a technique to perform fine-grained rendering predication using an IGPU. A graphics driver divides a 3D object into batches of triangles. The IGPU processes each batch of triangles through a modified rendering pipeline to determine if the batch is culled. The IGPU writes bits into a bitstream corresponding to the visibility of the batches. Advantageously, this approach to rendering predication provides fine-grained culling without adding unnecessary overhead, thereby optimizing both hardware resources and performance.
摘要:
A method of displaying graphics data is described. The method involves accessing the graphics data in a memory subsystem associated with one graphics subsystem. The graphics data is transmitted to a second graphics subsystem, where it is displayed on a monitor coupled to the second graphics subsystem.
摘要:
A system and method uses the capabilities of a geometry shader unit within the multi-threaded graphics processor to offload data compression computations from a central processing unit (CPU), reduce the memory needed to store image data, and reduce the bandwidth needed to transfer image data between graphics processors and between a graphics processor and a system memory. The multi-threaded graphics processor is also configured to perform decompression of the variable length compressed data using the geometry shader unit.
摘要:
A system and method uses the capabilities of a geometry shader unit within the multi-threaded graphics processor to offload data compression computations from a central processing unit (CPU), reduce the memory needed to store image data, and reduce the bandwidth needed to transfer image data between graphics processors and between a graphics processor and a system memory.
摘要:
Multiprocessor graphics systems support distributed antialiasing. In one embodiment, two (or more) graphics processors each render a version of the same image, with a difference in the sampling location (or locations) used for each pixel. A display head combines corresponding pixels generated by different graphics processors to produce an antialiased image. This distributed antialiasing technique can be scaled to any number of graphics processors.
摘要:
Multichip graphics processing subsystems include at least three distinct graphics devices (e.g., expansion cards) coupled to a high-speed bus (e.g., a PCI Express bus) and operable in a distributed rendering mode. One of the graphics devices provides pixel data to a display device, and at least one of the other graphics devices transfers the pixel data it generates to another of the devices via the bus to be displayed. Where the high-speed bus provides data transfer lanes, allocation of lanes among the graphics devices can be optimized.
摘要:
Method, apparatuses, and systems are presented for processing an ordered sequence of images for display using a display device, involving operating a plurality of graphics devices, including at least one first graphics device that processes certain ones of the ordered sequence of images, including a first image, and at least one second graphics device that processes certain other ones of the ordered sequence of images, including a second image, the first image preceding the second image in the ordered sequence, delaying at least one operation of the at least one second graphics device to allow processing by the at least one first graphics device to advance relative to processing by the at least one second graphics device, in order to maintain sequentially correct output of the ordered sequence of images, and selectively providing output from the graphics devices to the display device.
摘要:
Coherence of displayed images is provided for a graphics processing systems having multiple processors operating to render different portions of a current image in parallel. As each processor completes rendering of its portion of the current image, it generates a local ready event, then pauses its rendering operations. A synchronizing agent detects the local ready event and generates a global ready event after all of the graphics processors have generated local ready events. The global ready signal is transmitted to each graphics processor, which responds by resuming its rendering activity.