Per-shader preamble for graphics processing

    公开(公告)号:US09799089B1

    公开(公告)日:2017-10-24

    申请号:US15162272

    申请日:2016-05-23

    CPC classification number: G06T1/20 G06T1/60 G06T15/80

    Abstract: A method for processing data in a graphics processing unit including receiving a code block of instructions common to a plurality of groups of threads of a shader, executing the code block of instructions common to the plurality of groups of threads of the shader creating a result by a first group of threads of the plurality of groups of threads, storing the result of the code block of instructions common to the plurality of groups of threads of the shader in on-chip random access memory (RAM), the on-chip RAM accessible by each of the plurality of groups of threads, and upon a determination that storing the result of the code block of instructions common to the plurality of groups of threads of the shader has completed, returning the result of the code block of instructions common to the plurality of groups of threads of the shader from on-chip RAM.

    General purpose register allocation in streaming processor

    公开(公告)号:US10558460B2

    公开(公告)日:2020-02-11

    申请号:US15379195

    申请日:2016-12-14

    Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.

    Utilizing pipeline registers as intermediate storage

    公开(公告)号:US09747104B2

    公开(公告)日:2017-08-29

    申请号:US14275047

    申请日:2014-05-12

    Abstract: In one example, a method includes responsive to receiving, by a processing unit, one or more instructions requesting that a first value be moved from a first general purpose register (GPR) to a third GPR and that a second value be moved from a second GPR to a fourth GPR, copying, by an initial logic unit and during a first clock cycle, the first value to an initial pipeline register, copying, by the initial logic and during a second clock cycle, the second value to the initial pipeline register, copying, by a final logic unit and during a third clock cycle, the first value from a final pipeline register to the third GPR, and copying, by the final logic unit and during a fourth clock cycle, the second value from the final pipeline register to the fourth GPR.

    GPU divergence barrier
    5.
    发明授权

    公开(公告)号:US09652284B2

    公开(公告)日:2017-05-16

    申请号:US14043562

    申请日:2013-10-01

    CPC classification number: G06F9/4843 G06F9/3887 G06F9/522 G06T1/20

    Abstract: A device includes a memory, and at least one programmable processor configured to determine, for each warp of a plurality of warps, whether a Boolean expression is true for a corresponding thread of each warp, pause execution of each warp having a corresponding thread for which the expression is true, determine a number of active threads for each of the plurality of warps for which the expression is true, sort the plurality of warps for which the expression is true based on the number of active threads in each of the plurality of warps, swap thread data of an active thread of a first warp of the plurality of warps with thread data of an inactive thread of a second warp of the plurality of warps, and resume execution of the at least one of the plurality of warps for which the expression is true.

    SKIPPING OF DATA STORAGE
    7.
    发明申请
    SKIPPING OF DATA STORAGE 有权
    数据存储的移动

    公开(公告)号:US20160054998A1

    公开(公告)日:2016-02-25

    申请号:US14462932

    申请日:2014-08-19

    Abstract: Techniques are described in which an indication is included to indicate a last use of an intermediate value generated as part of determining a final value is not be stored in a general purpose register (GPR). A processing unit avoids storing the intermediate value in the GPR based on the indication because the intermediate value is no longer needed for determining the final value.

    Abstract translation: 描述了其中包括指示以指示作为确定最终值的一部分而生成的中间值的最后使用的指示不被存储在通用寄存器(GPR)中的技术。 处理单元基于指示,避免将中间值存储在GPR中,因为不再需要中间值来确定最终值。

    EMULATION OF FUSED MULTIPLY-ADD OPERATIONS
    9.
    发明申请
    EMULATION OF FUSED MULTIPLY-ADD OPERATIONS 有权
    融合多媒体操作的仿真

    公开(公告)号:US20160048374A1

    公开(公告)日:2016-02-18

    申请号:US14461890

    申请日:2014-08-18

    CPC classification number: G06F7/5443 G06F5/01 G06F7/483 G06F7/57

    Abstract: At least one processor may emulate a fused multiply-add operation for a first operand, a second operand, and a third operand. The at least one processor may determine an intermediate value based at least in part on multiplying the first operand with the second operand, determine at least one of an upper intermediate value or a lower intermediate value, wherein determining the upper intermediate value comprises rounding, towards zero, the intermediate value by a specified number of bits, and wherein determining the lower intermediate value comprises subtracting the intermediate value by the upper intermediate value, determine an upper value and a lower value based at least in part on adding or subtracting the third operand to one of the upper intermediate value or the lower intermediate value, and determine an emulated fused multiply-add result by adding the upper value and the lower value.

    Abstract translation: 至少一个处理器可以模拟第一操作数,第二操作数和第三操作数的融合乘法运算。 至少一个处理器可以至少部分地基于将第一操作数与第二操作数相乘来确定中间值,确定上中间值或下中间值中的至少一个,其中确定上中间值包括四舍五入 零,中间值乘以指定位数,并且其中确定较低中间值包括通过上述中间值减去中间值,至少部分地基于加上或减去第三操作数来确定上限值和较低值 到较高中间值或较低中间值之一,并通过加上上限值和下限值来确定仿真融合乘法运算结果。

    VECTOR SCALING INSTRUCTIONS FOR USE IN AN ARITHMETIC LOGIC UNIT
    10.
    发明申请
    VECTOR SCALING INSTRUCTIONS FOR USE IN AN ARITHMETIC LOGIC UNIT 审中-公开
    在算术逻辑单元中使用的矢量放大指令

    公开(公告)号:US20160019027A1

    公开(公告)日:2016-01-21

    申请号:US14331991

    申请日:2014-07-15

    Abstract: At least one processor may receive components of a vector, wherein each of the components of the vector comprises at least an exponent. The at least one processor may further determine a maximum exponent out of respective exponents of the components of the vector, and may determine a scaling value based at least in part on the maximum exponent. An arithmetic logic unit of the at least one processor may scale the vector, by subtracting the scaling value from each of the respective exponents of the components of the vector.

    Abstract translation: 至少一个处理器可以接收向量的分量,其中矢量的每个分量包括至少一个指数。 所述至少一个处理器可以进一步确定向量的分量的相应指数中的最大指数,并且可以至少部分地基于最大指数来确定缩放值。 所述至少一个处理器的算术逻辑单元可以通过从所述矢量的各个成分的各指数中减去所述缩放值来缩放所述向量。

Patent Agency Ranking