GPU hardware-based depth buffer direction tracking

    公开(公告)号:US11176734B1

    公开(公告)日:2021-11-16

    申请号:US17064188

    申请日:2020-10-06

    Abstract: The present disclosure relates to methods and apparatus for graphics processing. An example method generally includes receiving, at a graphics processing unit (GPU), a plurality of commands corresponding to a plurality of draws across a frame, each of the plurality of commands indicating a depth test direction with respect to a low-resolution depth (LRZ) buffer for the corresponding draw. The method generally includes maintaining, at the GPU, a LRZ status buffer to store a corresponding depth test direction for a first command in time of the plurality of commands processed by the GPU. The method generally includes disabling, at the GPU, use of the LRZ buffer for depth testing for any of the plurality of commands remaining unprocessed after processing a command of the plurality of commands having a different depth test direction than the corresponding depth test direction stored in the LRZ status buffer.

    SHADER CONTROLLED WAVE SCHEDULING PRIORITY

    公开(公告)号:US20210103467A1

    公开(公告)日:2021-04-08

    申请号:US16591349

    申请日:2019-10-02

    Abstract: A graphics processing unit (GPU) may execute a shader program that may include instructions for prioritization and scheduling of waves processed in parallel. According to some aspects of the described techniques, instruction variants (e.g., set-lowest-priority, set-highest-priority, set-priority-to-N, etc.) may be executed by hardware during processing of a wave to control (e.g., modify) processing priority for that wave. As such, the described techniques for shader controlled wave scheduling priority may allow waves to be processed while avoiding interference with lagging waves, while avoiding taking resources from lagging waves, etc. In one example, when a set-lowest-priority instruction is executed by hardware during execution of a first loop of a first wave, the instruction may push the current wave's priority to be lowest on the list. Such may result in pending loops from other waves being processed prior to the processing returning to a second loop of the first wave.

    Uniform predicates in shaders for graphics processing units

    公开(公告)号:US10706494B2

    公开(公告)日:2020-07-07

    申请号:US16103336

    申请日:2018-08-14

    Abstract: A method for processing data in a graphics processing unit including receiving an indication that all threads of a warp in a graphics processing unit (GPU) are to execute a same branch in a first set of instructions, storing one or more predicate bits in a memory as a single set of predicate bits, wherein the single set of predicate bits applies to all of the threads in the warp, and executing a portion of the first set of instructions in accordance with the single set of predicate bits. Executing the first set of instructions may include executing the first set of instruction in accordance with the single set of predicate bits using a single instruction, multiple data (SIMD) processing core and/or executing the first set of instruction in accordance with the single set of predicate bits using a scalar processing unit.

    Patched shading in graphics processing

    公开(公告)号:US10535185B2

    公开(公告)日:2020-01-14

    申请号:US13830075

    申请日:2013-03-14

    Abstract: Aspects of this disclosure relate to a process for rendering graphics that includes performing, with a hardware unit of a graphics processing unit (GPU) designated for vertex shading, a vertex shading operation to shade input vertices so as to output vertex shaded vertices, wherein the hardware unit adheres to an interface that receives a single vertex as an input and generates a single vertex as an output. The process also includes performing, with the hardware unit of the GPU designated for vertex shading, a hull shading operation to generate one or more control points based on one or more of the vertex shaded vertices, wherein the one or more hull shading operations operate on at least one of the one or more vertex shaded vertices to output the one or more control points.

    FIXED-STRIDE DRAW TABLES FOR TILED RENDERING
    15.
    发明申请

    公开(公告)号:US20200013137A1

    公开(公告)日:2020-01-09

    申请号:US16028151

    申请日:2018-07-05

    Abstract: Methods, systems, and devices for rendering are described. A device may divide a frame into a plurality of bins. The device may generate a command stream containing multiple repetitions of a fixed-stride draw table (FSDT), where each repetition of the FSDT includes a respective state vector for one or more hardware registers of a set of hardware registers. The device may identify, for each bin, a subset of the multiple repetitions of the FSDT in the command stream that include a live draw call. The device may execute, using the set of hardware registers, one or more rendering commands for each bin based at least in part on the corresponding subset of the multiple repetitions of the FSDT.

    Per-vertex variable rate shading
    16.
    发明授权

    公开(公告)号:US10192280B2

    公开(公告)日:2019-01-29

    申请号:US15434851

    申请日:2017-02-16

    Abstract: A graphics processing unit (GPU) may rasterize a primitive into a plurality of samples, wherein vertices of the primitive are associated with VRS parameters. The GPU may determine a VRS quality group that comprises one or more sub regions of the plurality of samples based at least in part on the VRS parameters. The GPU may fragment shade a VRS tile that represents the VRS quality group, wherein the VRS tile comprises fewer samples than the VRS quality group. The GPU may amplify the stored VRS tile into shaded fragments that correspond to the VRS quality group.

    Per-shader preamble for graphics processing

    公开(公告)号:US09799089B1

    公开(公告)日:2017-10-24

    申请号:US15162272

    申请日:2016-05-23

    CPC classification number: G06T1/20 G06T1/60 G06T15/80

    Abstract: A method for processing data in a graphics processing unit including receiving a code block of instructions common to a plurality of groups of threads of a shader, executing the code block of instructions common to the plurality of groups of threads of the shader creating a result by a first group of threads of the plurality of groups of threads, storing the result of the code block of instructions common to the plurality of groups of threads of the shader in on-chip random access memory (RAM), the on-chip RAM accessible by each of the plurality of groups of threads, and upon a determination that storing the result of the code block of instructions common to the plurality of groups of threads of the shader has completed, returning the result of the code block of instructions common to the plurality of groups of threads of the shader from on-chip RAM.

    Selectively merging partially-covered tiles to perform hierarchical z-culling
    18.
    发明授权
    Selectively merging partially-covered tiles to perform hierarchical z-culling 有权
    选择性地合并部分覆盖的瓦片来执行分层z剔除

    公开(公告)号:US09311743B2

    公开(公告)日:2016-04-12

    申请号:US14061506

    申请日:2013-10-23

    CPC classification number: G06T15/405 G06T1/60 G06T15/005

    Abstract: This disclosure describes techniques for performing hierarchical z-culling in a graphics processing system. In some examples, the techniques for performing hierarchical z-culling may involve selectively merging partially-covered source tiles for a tile location into a fully-covered merged source tile based on whether conservative farthest z-values for the partially-covered source tiles are nearer than a culling z-value for the tile location, and using a conservative farthest z-value associated with the fully-covered merged source tile to update the culling z-value for the tile location. In further examples, the techniques for performing hierarchical z-culling may use a cache unit that is not associated with an underlying memory to store conservative farthest z-values and coverage masks for merged source tiles. The capacity of the cache unit may be smaller than the size of cache needed to store merged source tile data for all of the tile locations in a render target.

    Abstract translation: 本公开描述了在图形处理系统中执行分层z剔除的技术。 在一些示例中,用于执行分层z剔除的技术可以包括基于对于部分覆盖的源平铺的保守最远的z值是否更接近而选择性地将用于瓦片位置的部分覆盖的源瓦片合并到完全覆盖的合并源瓦片中 比用于瓦片位置的剔除z值,以及使用与完全覆盖的合并源平铺相关联的保守最远的z值来更新瓦片位置的剔除z值。 在另外的示例中,用于执行分层z剔除的技术可以使用与底层存储器不相关联的高速缓存单元来存储用于合并的源瓦片的保守最远的z值和覆盖掩码。 高速缓存单元的容量可以小于存储渲染目标中的所有瓦片位置的合并的源瓦片数据所需的高速缓存的大小。

    SELECTIVELY MERGING PARTIALLY-COVERED TILES TO PERFORM HIERARCHICAL Z-CULLING
    19.
    发明申请
    SELECTIVELY MERGING PARTIALLY-COVERED TILES TO PERFORM HIERARCHICAL Z-CULLING 有权
    选择部分合并平台进行分层Z轴

    公开(公告)号:US20150109293A1

    公开(公告)日:2015-04-23

    申请号:US14061506

    申请日:2013-10-23

    CPC classification number: G06T15/405 G06T1/60 G06T15/005

    Abstract: This disclosure describes techniques for performing hierarchical z-culling in a graphics processing system. In some examples, the techniques for performing hierarchical z-culling may involve selectively merging partially-covered source tiles for a tile location into a fully-covered merged source tile based on whether conservative farthest z-values for the partially-covered source tiles are nearer than a culling z-value for the tile location, and using a conservative farthest z-value associated with the fully-covered merged source tile to update the culling z-value for the tile location. In further examples, the techniques for performing hierarchical z-culling may use a cache unit that is not associated with an underlying memory to store conservative farthest z-values and coverage masks for merged source tiles. The capacity of the cache unit may be smaller than the size of cache needed to store merged source tile data for all of the tile locations in a render target.

    Abstract translation: 本公开描述了在图形处理系统中执行分层z剔除的技术。 在一些示例中,用于执行分层z剔除的技术可以包括基于对于部分覆盖的源平铺的保守最远的z值是否更接近而选择性地将用于瓦片位置的部分覆盖的源瓦片合并到完全覆盖的合并源瓦片中 比用于瓦片位置的剔除z值,以及使用与完全覆盖的合并源平铺相关联的保守最远的z值来更新瓦片位置的剔除z值。 在另外的示例中,用于执行分层z剔除的技术可以使用与底层存储器不相关联的高速缓存单元来存储用于合并的源瓦片的保守最远的z值和覆盖掩码。 高速缓存单元的容量可以小于存储渲染目标中的所有瓦片位置的合并的源瓦片数据所需的高速缓存的大小。

    Single pass anti-ringing clamping enabled image processing

    公开(公告)号:US12254555B2

    公开(公告)日:2025-03-18

    申请号:US18194324

    申请日:2023-03-31

    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for single pass anti-ringing clamping enabled image processing. A graphics processor may perform a filtering operation on a set of texture samples. The graphics processor may select, during a single sampling operation, a minimum value and a maximum value associated with the set of texture samples during the performance of the filtering operation on the set of texture samples. The graphics processor may adjust, during the single sampling operation, a value of a filtered texture sample associated with the set of texture samples based on the minimum value and the maximum value. The graphics processor may output an indication of the adjusted value of the filtered texture sample.

Patent Agency Ranking