MULTI-SAMPLE INSTRUCTIONS FOR DISTRIBUTION OF IMAGE PROCESSING WORKLOAD BETWEEN TEXTURE AND SHARED PROCESSORS

    公开(公告)号:US20210104009A1

    公开(公告)日:2021-04-08

    申请号:US16591528

    申请日:2019-10-02

    Abstract: Methods, systems, and devices for image processing are described. A device may identify a target pixel having a texel coordinate in an image. The device may select, based on the texel coordinate, a first texel sample of a first set of texel samples and a second texel sample of a second set of texel samples. In some examples, the device may group the first texel sample and the second texel sample into a third set of texel samples. The device may generate an instruction including the third set of texel samples and a weighted sum associated with the first texel sample and the second texel sample, and process the third set of texel samples based on the instruction. In some examples, the instruction may be a macro instruction.

    Dynamic shader instruction nullification for graphics processing

    公开(公告)号:US10430912B2

    公开(公告)日:2019-10-01

    申请号:US15432170

    申请日:2017-02-14

    Abstract: A GPU may be configured to detect and nullify unnecessary instructions. Nullifying unnecessary instructions include overwriting a detected unnecessary instruction with a no operation (NOP) instruction. In another example, nullifying unnecessary instructions may include writing a value to a 1-bit instruction memory. Each bit of the 1-bit instruction memory may be associated with a particular instruction of the draw call. If the 1-bit instruction memory has a true value (e.g., 1), the GPU is configured to not execute the particular instruction.

    ADDING METADATA TO TEXTURE SURFACES FOR BANDWIDTH COMPRESSION

    公开(公告)号:US20190087930A1

    公开(公告)日:2019-03-21

    申请号:US15707608

    申请日:2017-09-18

    Abstract: A method for memory bandwidth compression comprising analyzing a texture surface to identify one or more areas of the texture surface that are fetchable with lower memory bandwidth consumption as compared to other areas of the texture surface, adding metadata to a metadata surface associated with the texture surface based on the analysis, the metadata indicating the one or more areas of the texture surface that are fetchable with lower memory bandwidth consumption as compared to other areas of the texture surface, and fetching the texture surface in accordance with the metadata.

    Deferred batching of incremental constant loads

    公开(公告)号:US10157443B1

    公开(公告)日:2018-12-18

    申请号:US15662933

    申请日:2017-07-28

    Abstract: The techniques of this disclosure include deferred batching of incremental constant loads. Graphics APIs include the ability to use lightweight constants for use by shaders. A buffer is allocated by a graphics processing unit (GPU) driver that contains a snapshot of the current lightweight constants. This may provide a complete set of state to serve as a starting point. From then on updates to the lightweight constants may be appended to this buffer in an incremental fashion by inserting the update and increasing the size of the buffer by a command processor on a graphics processing unit (GPU). The incremental nature of the updates may be captured, but removes the need for issuing them on every draw call and instead the incremental updates may be batch processed when a live draw call is encountered.

    Uniform predicates in shaders for graphics processing units

    公开(公告)号:US10115175B2

    公开(公告)日:2018-10-30

    申请号:US15048599

    申请日:2016-02-19

    Abstract: A method for processing data in a graphics processing unit including receiving an indication that all threads of a warp in a graphics processing unit (GPU) are to execute a same branch in a first set of instructions, storing one or more predicate bits in a memory as a single set of predicate bits, wherein the single set of predicate bits applies to all of the threads in the warp, and executing a portion of the first set of instructions in accordance with the single set of predicate bits. Executing the first set of instructions may include executing the first set of instruction in accordance with the single set of predicate bits using a single instruction, multiple data (SIMD) processing core and/or executing the first set of instruction in accordance with the single set of predicate bits using a scalar processing unit.

    UNIFORM PREDICATES IN SHADERS FOR GRAPHICS PROCESSING UNITS

    公开(公告)号:US20170243320A1

    公开(公告)日:2017-08-24

    申请号:US15048599

    申请日:2016-02-19

    CPC classification number: G06T1/20 G06F9/30072 G06F9/3851 G06F9/3887 G06T5/008

    Abstract: A method for processing data in a graphics processing unit including receiving an indication that all threads of a warp in a graphics processing unit (GPU) are to execute a same branch in a first set of instructions, storing one or more predicate bits in a memory as a single set of predicate bits, wherein the single set of predicate bits applies to all of the threads in the warp, and executing a portion of the first set of instructions in accordance with the single set of predicate bits. Executing the first set of instructions may include executing the first set of instruction in accordance with the single set of predicate bits using a single instruction, multiple data (SIMD) processing core and/or executing the first set of instruction in accordance with the single set of predicate bits using a scalar processing unit.

    Emulation of fused multiply-add operations

    公开(公告)号:US09645792B2

    公开(公告)日:2017-05-09

    申请号:US14461890

    申请日:2014-08-18

    CPC classification number: G06F7/5443 G06F5/01 G06F7/483 G06F7/57

    Abstract: At least one processor may emulate a fused multiply-add operation for a first operand, a second operand, and a third operand. The at least one processor may determine an intermediate value based at least in part on multiplying the first operand with the second operand, determine at least one of an upper intermediate value or a lower intermediate value, wherein determining the upper intermediate value comprises rounding, towards zero, the intermediate value by a specified number of bits, and wherein determining the lower intermediate value comprises subtracting the intermediate value by the upper intermediate value, determine an upper value and a lower value based at least in part on adding or subtracting the third operand to one of the upper intermediate value or the lower intermediate value, and determine an emulated fused multiply-add result by adding the upper value and the lower value.

    Rendering graphics to overlapping bins
    110.
    发明授权
    Rendering graphics to overlapping bins 有权
    将图形渲染到重叠的区域

    公开(公告)号:US09569811B2

    公开(公告)日:2017-02-14

    申请号:US14316275

    申请日:2014-06-26

    Abstract: In an example, a method for rendering graphics data includes rendering pixels of a first bin of a plurality of bins, wherein the pixels of the first bin are associated with a first portion of an image, and rendering, to the first bin, one or more pixels that are located outside the first portion of the image and associated with a second, different bin of the plurality of bins. The method also includes rendering the one or more pixels associated with the second bin to the second bin, such that the one or more pixels are rendered to both the first bin and the second bin.

    Abstract translation: 在一个示例中,用于渲染图形数据的方法包括渲染多个箱的第一仓的像素,其中第一仓的像素与图像的第一部分相关联,并且向第一仓中呈现一个或 更多的像素位于图像的第一部分之外并且与多个箱的第二不同仓相关联。 该方法还包括将与第二仓相关联的一个或多个像素渲染到第二仓,使得一个或多个像素被渲染到第一仓和第二仓。

Patent Agency Ranking