摘要:
A method of occlusion culling of graphic objects, comprising the steps of storing a first mask and one or more depth values associated with areas inside and outside the mask for a pre-defined region, and evaluating the visibility of the primitive covering the same region, wherein visibility evaluation begins after the computation of the coverage mask of the primitive in the region, and the computation of one or more depth values representing the pixels of the primitive. The method of the present invention is a real-time method of generating per-region coverage mask and associated Z values after the second primitive is rendered in the same region, which can maximize the bandwidth savings for Z read for both overlapping and non-overlapping primitives, with different relations between their depth values.
摘要:
In general, aspects of this disclosure describe example techniques for efficient usage of the fixed data rate processing of a graphics processing unit (GPU) for a variable data rate processing. For example, the GPU may be coupled to a pixel value processing unit that receives pixel values for pixels in an image processed by the GPU. The pixel value processing unit may determine whether the pixel values are for pixels that require further processing, and store the pixel values for the pixels that are required for further processing in a buffer.
摘要:
Scissoring for any number of scissoring regions is performed in a sequential order by drawing one scissoring region at a time on a drawing surface and updating scissor values for pixels within each scissoring region. A scissor value for a pixel may indicate the number of scissoring regions covering the pixel and may be incremented for each scissoring region covering the pixel. A scissor value for a pixel may also be a bitmap, and a bit for a scissoring region may be set to one if the pixel is within the scissoring region. Pixels within a region of interest are passed and rendered, and pixels outside of the region are discarded. This region may be defined by a reference value, which may be set to (a) one for the union of all scissoring regions, for a scissoring UNION operation, or (b) larger than one for the intersection of multiple (e.g., all) scissoring regions, for a scissoring AND operation.
摘要:
This disclosure describes techniques for removing vertex points during two-dimensional (2D) graphics rendering using three-dimensional (3D) graphics hardware. In accordance with the described techniques one or more vertex points may be removed during 2D graphics rendering using 3D graphics hardware. For example, the techniques may remove redundant vertex points in the display coordinate space by discarding vertex points that have the substantially same positional coordinates in the display coordinate space as a previous vertex point. Alternatively or additionally, the techniques may remove excess vertex points that lie in a straight line. Removing the redundant vertex points or vertex points that lie in a straight line allow for more efficient utilization of the hardware resources of the GPU and increase the speed at which the GPU renders the image for display.
摘要:
Techniques are described for reordering commands to improve the speed at which at least one command stream may execute. Prior to distributing commands in the at least one command stream to multiple pipelines, a multimedia processor analyzes any inter-pipeline dependencies and determines the current execution state of the pipelines. The processor may, based on this information, reorder the at least one command stream by prioritizing commands that lack any current dependencies and therefore may be executed immediately by the appropriate pipeline. Such out of order execution of commands in the at least one command stream may increase the throughput of the multimedia processor by increasing the rate at which the command stream is executed.
摘要:
Methods and apparatuses for accessing data within programmable graphics hardware are provided. According to one aspect, a user inserts special log commands into a software program, which is compiled into instructions for the programmable graphics hardware to execute. The hardware writes data to an external memory during runtime according to a flow control protocol, and the software driver reads the data from the memory to display to the user.
摘要:
Provided is a system for monitoring the performance in a computer graphics processor having a plurality of pipeline processing blocks in a graphics pipeline. The system includes: performance monitoring logic, configured to gather data corresponding to graphics pipeline performance; a plurality of counting logic blocks, located within the performance monitoring logic; a plurality of logical counters, located in each of the plurality of pipeline processing blocks, configured to transmit a plurality of count signals to the performance monitoring logic; a plurality of counter configuration registers, configured to map a portion of the plurality of logical counters to the plurality of counting logic blocks; and a command processor configured to provide a plurality of commands to the performance monitoring logic.
摘要:
This disclosure describes techniques for handling divergent thread conditions in a multi-threaded processing system. In some examples, a control flow unit may obtain a control flow instruction identified by a program counter value stored in a program counter register. The control flow instruction may include a target value indicative of a target program counter value for the control flow instruction. The control flow unit may select one of the target program counter value and a minimum resume counter value as a value to load into the program counter register. The minimum resume counter value may be indicative of a smallest resume counter value from a set of one or more resume counter values associated with one or more inactive threads. Each of the one or more resume counter values may be indicative of a program counter value at which a respective inactive thread should be activated.
摘要:
This disclosure describes techniques for handling divergent thread conditions in a multi-threaded processing system. In some examples, a control flow unit may obtain a control flow instruction identified by a program counter value stored in a program counter register. The control flow instruction may include a target value indicative of a target program counter value for the control flow instruction. The control flow unit may select one of the target program counter value and a minimum resume counter value as a value to load into the program counter register. The minimum resume counter value may be indicative of a smallest resume counter value from a set of one or more resume counter values associated with one or more inactive threads. Each of the one or more resume counter values may be indicative of a program counter value at which a respective inactive thread should be activated.
摘要:
The present disclosure includes system and method of mapping shader variables into physical registers. In an embodiment, a graphics processing unit (GPU) and a memory coupled to the GPU are disclosed. The memory includes a processor readable data file that has a register file portion. The register file portion has a rectangular structure including a plurality of data items. At least two of the plurality of data items corresponding to data elements of a shader program. The data elements have different data storage types.