Runtime mechanism to optimize shader execution flow

    公开(公告)号:US12229864B2

    公开(公告)日:2025-02-18

    申请号:US17817815

    申请日:2022-08-05

    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for runtime optimization of the shader execution flow. A graphics processor may obtain instruction execution data associated with a graphics workload, the instruction execution data including graphics data for a set of shader operations. The graphics processor may configure, at a first iteration, at least one predication value based on the instruction execution data including the graphics data for the set of shader operations. The graphics processor may adjust, at a second iteration, an execution flow of the graphics workload based on the configured at least one predication value, the execution flow of the graphics workload including the set of shader operations. The graphics processor may execute or refrain from executing, at the second iteration, each of the set of shader operations based on the adjusted execution flow of the graphics workload.

    Out of order wave slot release for a terminated wave

    公开(公告)号:US11094032B2

    公开(公告)日:2021-08-17

    申请号:US16734252

    申请日:2020-01-03

    Abstract: Methods, systems, and devices for image processing are described. A device may determine, based on a test operation, to terminate a first wave associated with a first slot of a set of slots. The device may update a terminated wave bit associated with the first slot based on the determination to terminate the first wave. In some aspects, the device may update a number of invocations field associated with the first wave based on the determination to terminate the first wave. The device may release the first slot based on updating the terminated wave bit and the number of invocations field. In some examples, the device may output the number of invocations field to a rendering backend of the device based on the terminated wave bit.

    General purpose register allocation in streaming processor

    公开(公告)号:US10558460B2

    公开(公告)日:2020-02-11

    申请号:US15379195

    申请日:2016-12-14

    Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.

    Methods and apparatus for wave slot management

    公开(公告)号:US11055808B2

    公开(公告)日:2021-07-06

    申请号:US16455641

    申请日:2019-06-27

    Abstract: The present disclosure relates to methods and apparatus for graphics processing. In some aspects, the apparatus can determine one or more context states of at least one context register in each of multiple wave slots. The apparatus can also send information corresponding to the one or more context states in one of the multiple wave slots to a context queue. Further, the apparatus can convert the information corresponding to the one or more context states to context information compatible with the context queue. The apparatus can also store the context information compatible with the context queue in the context queue. In some aspects, the apparatus can send the context information compatible with the context queue to one of the multiple wave slots. Additionally, the apparatus can convert the context information compatible with the context queue to the information corresponding to the one or more context states.

    GENERAL PURPOSE REGISTER ALLOCATION IN STREAMING PROCESSOR

    公开(公告)号:US20180165092A1

    公开(公告)日:2018-06-14

    申请号:US15379195

    申请日:2016-12-14

    Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.

    Dynamic wave pairing
    9.
    发明授权

    公开(公告)号:US11954758B2

    公开(公告)日:2024-04-09

    申请号:US17652478

    申请日:2022-02-24

    CPC classification number: G06T1/20 G06F9/505

    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for dynamic wave pairing. A graphics processor may allocate one or more GPU workloads to one or more wave slots of a plurality of wave slots. The graphics processor may select a first execution slot of a plurality of execution slots for executing the one or more GPU workloads. The selection may be based on one of a plurality of granularities. The graphics processor may execute, at the selected first execution slot, the one or more GPU workloads at the one of the plurality of granularities.

Patent Agency Ranking