Abstract:
In general, techniques are described for visibility-based state updates in graphical processing units (GPUs). A device that renders image data comprising a memory configured to store state data and a GPU may implement the techniques. The GPU may be configured to perform a multi-pass rendering process to render an image from the image data. The GPU determines visibility information for a plurality of objects defined by the image data during a first pass of the multi-pass rendering process. The visibility information indicates whether each of the plurality of objects will be visible in the image rendered from the image data during a second pass of the multi-pass rendering process. The GPU then retrieves the state data from the memory for use by the second pass of the multi-pass rendering process in rendering the plurality of objects of the image data based on the visibility information.
Abstract:
This disclosure describes techniques for extending the architecture of a general purpose graphics processing unit (GPGPU) with parallel processing units to allow efficient processing of pipeline-based applications. The techniques include configuring local memory buffers connected to parallel processing units operating as stages of a processing pipeline to hold data for transfer between the parallel processing units. The local memory buffers allow on-chip, low-power, direct data transfer between the parallel processing units. The local memory buffers may include hardware-based data flow control mechanisms to enable transfer of data between the parallel processing units. In this way, data may be passed directly from one parallel processing unit to the next parallel processing unit in the processing pipeline via the local memory buffers, in effect transforming the parallel processing units into a series of pipeline stages.
Abstract:
This disclosure describes techniques for reducing memory access bandwidth in a graphics processing system based on destination alpha values. The techniques may include retrieving a destination alpha value from a bin buffer, the destination alpha value being generated in response to processing a first pixel associated with a first primitive. The techniques may further include determining, based on the destination alpha value, whether to perform an action that causes one or more texture values for a second pixel to not be retrieved from a texture buffer. In some examples, the action may include discarding the second pixel from a pixel processing pipeline prior to the second pixel arriving at a texture mapping stage of the pixel processing pipeline. The second pixel may be associated with a second primitive different than the first primitive.
Abstract:
This disclosure describes techniques for selectively activating a resume check operation in a single instruction, multiple data (SIMD) processing system. A processor is described that is configured to selectively enable or disable a resume check operation for a particular instruction based on information included in the instruction that indicates whether a resume check operation is to be performed for the instruction. A compiler is also described that is configured to generate compiled code which, when executed, causes a resume check operation to be selectively enabled or disabled for particular instructions. The compiled code may include one or more instructions that each specify whether a resume check operation is to be performed for the respective instruction. The techniques of this disclosure may be used to reduce the power consumption of and/or improve the performance of a SIMD system that utilizes a resume check operation to manage the reactivation of deactivated threads.
Abstract:
This disclosure is directed to deferred preemption techniques for scheduling graphics processing unit (GPU) command streams for execution on a GPU. A host CPU is described that is configured to control a GPU to perform deferred-preemption scheduling. For example, a host CPU may select one or more locations in a GPU command stream as being one or more locations at which preemption is allowed to occur in response to receiving a preemption notification, and may place one or more tokens in the GPU command stream based on the selected one or more locations. The tokens may indicate to the GPU that preemption is allowed to occur at the selected one or more locations. This disclosure further describes a GPU configured to preempt execution of a GPU command stream based on one or more tokens placed in a GPU command stream.
Abstract:
This disclosure describes techniques for extending the architecture of a general purpose graphics processing unit (GPGPU) with parallel processing units to allow efficient processing of pipeline-based applications. The techniques include configuring local memory buffers connected to parallel processing units operating as stages of a processing pipeline to hold data for transfer between the parallel processing units. The local memory buffers allow on-chip, low-power, direct data transfer between the parallel processing units. The local memory buffers may include hardware-based data flow control mechanisms to enable transfer of data between the parallel processing units. In this way, data may be passed directly from one parallel processing unit to the next parallel processing unit in the processing pipeline via the local memory buffers, in effect transforming the parallel processing units into a series of pipeline stages.
Abstract:
A graphics processing system comprises at least one memory device storing a plurality of pixel command threads and a plurality of vertex command threads. An arbiter coupled to the at least one memory device is provided that selects a command thread from either the plurality of pixel or vertex command threads based on relative priorities of the plurality of pixel command threads and the plurality of vertex command threads. The selected command thread is provided to a command processing engine capable of processing pixel command threads and vertex command threads.
Abstract:
A method and apparatus for dynamic issuing of memory access instructions. In particular, a specific data access request that is about to be sent to a memory, such as a frame buffer, is dynamically chosen based upon pending requests within a pipeline. It is possible to optimize video data requests by dynamically selecting a memory access request at the time the request is made to the memory. In particular, if it is recognized that the memory about to be accessed will no longer be needed by subsequent memory requests, the request can be changed from a normal access request to an access request with an auto-close option. By using an auto close option, the memory bank being accessed is closed after the access, without issuing a separate memory close instruction.
Abstract:
The example techniques described in this disclosure may be directed to synchronization between producer shaders and consumer shaders. For example, a graphics processing unit (GPU) may execute a producer shader to produce graphics data. After the completion of the production of graphics data, the producer shader may store a value indicative of the amount of produced graphics data. The GPU may execute one or more consumer shaders, after the storage of the value indicative of the amount of produced graphics data, to consume the produced graphics data.
Abstract:
A method and system for higher level filtering uses a native bilinear filter, typically found in a texture mapper, and combines a plurality of bilinear filter results from the bilinear filter to produce a higher level filtered texel value. A native bilinear filter is operative to generate bilinear filtered texel values by performing a plurality of bilinearly filtered texture fetches using bilinear filter fetch coordinates. The method and system combines the plurality of bilinear filtered texel values with a plurality of weights to generate the higher level filtered texel value.