GPU predication
    42.
    发明授权

    公开(公告)号:US09633409B2

    公开(公告)日:2017-04-25

    申请号:US13975520

    申请日:2013-08-26

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to predication. In one embodiment, a graphics processing unit is disclosed that includes a first set of architecturally-defined registers configured to store predication information. The graphics processing unit further includes a second set of registers configured to mirror the first set of registers and an execution pipeline configured to discontinue execution of an instruction sequence based on predication information in the second set of registers. In one embodiment, the second set of registers includes one or more registers proximal to an output of the execution pipeline. In some embodiments, the execution pipeline writes back a predicate value determined for a predicate writer to the second set of registers. The first set of architecturally-defined registers is then updated with the predicate value written back to the second set of registers. In some embodiments, the execution pipeline discontinues execution of the instruction sequence without stalling.

    Apparatus implementing instructions that impose pipeline interdependencies
    44.
    发明授权
    Apparatus implementing instructions that impose pipeline interdependencies 有权
    实施施加管道相互依赖性的指令的装置

    公开(公告)号:US09183611B2

    公开(公告)日:2015-11-10

    申请号:US13935299

    申请日:2013-07-03

    Applicant: Apple Inc.

    CPC classification number: G06T1/20 G06F9/3838 G06F9/3851 G06F9/3867 G06F9/3885

    Abstract: Techniques are disclosed relating to implementation of gradient-type graphics instructions. In one embodiment, an apparatus includes first and second execution pipelines and a register file. In this embodiment, the register file is coupled to the first and second execution pipelines and configured to store operands for the first and second execution pipelines. In this embodiment, the apparatus is configured to determine that a graphics instruction imposes a dependency between the first and second pipeline. In response to this determination, the apparatus is configured to read a plurality of operands from the register file including an operand assigned to the second execution pipeline and to select the operand assigned to the second execution pipeline as an input operand for the first execution pipeline. The apparatus may be configured such that operands assigned to the second execution pipeline are accessible by the first execution pipeline only via the register file and not from other locations.

    Abstract translation: 公开了与梯度型图形指令的实现有关的技术。 在一个实施例中,装置包括第一和第二执行流水线和寄存器文件。 在该实施例中,寄存器文件耦合到第一和第二执行流水线并且被配置为存储用于第一和第二执行流水线的操作数。 在该实施例中,该装置被配置为确定图形指令施加第一和第二流水线之间的依赖关系。 响应于该确定,该装置被配置为从寄存器文件读取包括分配给第二执行流水线的操作数的多个操作数,并且将分配给第二执行流水线的操作数作为第一执行流水线的输入操作数进行选择。 该装置可以被配置为使得分配给第二执行流水线的操作数仅由第一执行流水线仅通过寄存器文件而不是来自其他位置。

    GPU PREDICATION
    45.
    发明申请
    GPU PREDICATION 有权
    GPU预测

    公开(公告)号:US20150054837A1

    公开(公告)日:2015-02-26

    申请号:US13975520

    申请日:2013-08-26

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to predication. In one embodiment, a graphics processing unit is disclosed that includes a first set of architecturally-defined registers configured to store predication information. The graphics processing unit further includes a second set of registers configured to mirror the first set of registers and an execution pipeline configured to discontinue execution of an instruction sequence based on predication information in the second set of registers. In one embodiment, the second set of registers includes one or more registers proximal to an output of the execution pipeline. In some embodiments, the execution pipeline writes back a predicate value determined for a predicate writer to the second set of registers. The first set of architecturally-defined registers is then updated with the predicate value written back to the second set of registers. In some embodiments, the execution pipeline discontinues execution of the instruction sequence without stalling.

    Abstract translation: 公开了与预测有关的技术。 在一个实施例中,公开了一种图形处理单元,其包括被配置为存储预测信息的第一组体系结构定义的寄存器。 图形处理单元还包括配置为镜像第一组寄存器的第二组寄存器和配置为基于第二组寄存器中的预测信息中止指令序列的执行的执行流水线。 在一个实施例中,第二组寄存器包括靠近执行流水线的输出的一个或多个寄存器。 在一些实施例中,执行流水线将为谓词写入器确定的谓词值写回到第二组寄存器。 然后,第一组体系结构定义的寄存器被更新,并将谓词值写回第二组寄存器。 在一些实施例中,执行流水线不中断执行指令序列。

    Dynamic buffering control for compute work distribution

    公开(公告)号:US11500692B2

    公开(公告)日:2022-11-15

    申请号:US17021720

    申请日:2020-09-15

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to dynamically adjusting buffering for distributing compute work in a graphics processor. In some embodiments, the graphics processor includes shader circuitry configured to process compute work from a compute kernel, multiple distributed workload parser circuits configured to send compute work to the shader circuitry, primary workload parser circuitry configured to send, via a communications fabric, compute work from the compute kernel to the distributed workload parser circuits, and buffer circuitry configured to buffer compute work received by one or more of the distributed workload parser circuits from the primary workload parser circuitry. In some embodiments, the graphics processor is configured to dynamically adjust a limit on the number of entries used in the buffer circuitry based on information indicating complexity of the compute kernel. This may advantageously maintain launch rates while reducing or avoiding workload imbalances, in some embodiments.

    Dynamic Buffering Control for Compute Work Distribution

    公开(公告)号:US20220083396A1

    公开(公告)日:2022-03-17

    申请号:US17021720

    申请日:2020-09-15

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to dynamically adjusting buffering for distributing compute work in a graphics processor. In some embodiments, the graphics processor includes shader circuitry configured to process compute work from a compute kernel, multiple distributed workload parser circuits configured to send compute work to the shader circuitry, primary workload parser circuitry configured to send, via a communications fabric, compute work from the compute kernel to the distributed workload parser circuits, and buffer circuitry configured to buffer compute work received by one or more of the distributed workload parser circuits from the primary workload parser circuitry. In some embodiments, the graphics processor is configured to dynamically adjust a limit on the number of entries used in the buffer circuitry based on information indicating complexity of the compute kernel. This may advantageously maintain launch rates while reducing or avoiding workload imbalances, in some embodiments.

    Completion signaling techniques in distributed processor

    公开(公告)号:US11250538B2

    公开(公告)日:2022-02-15

    申请号:US16812724

    申请日:2020-03-09

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to tracking compute workgroup completions in a distributed processor. In some embodiments, an apparatus includes a plurality of shader processors configured to perform operations for compute workgroups included in compute kernels, a master workload parser circuit, a plurality of distributed workload parser circuits, and a communications fabric connected to the plurality of distributed workload parser circuits and the master workload parser circuit. In some embodiments, a distributed workload parser circuit is configured to maintain, for each of a set of the shader processors, a data structure that specifies a count of workgroup completions for one or more kernels processed by the shader processor, determine, for the set of shader processors based on counts of workgroup completions for a first kernel, an aggregate count of completions to report for the first kernel, send the aggregate count to the master workload parser circuit over the communications fabric, and adjust the data structures to reflect counts included in the aggregate count.

    Low Latency Fetch Circuitry for Compute Kernels

    公开(公告)号:US20210026638A1

    公开(公告)日:2021-01-28

    申请号:US17065761

    申请日:2020-10-08

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to fetching items from a compute command stream that includes compute kernels. In some embodiments, stream fetch circuitry sequentially pre-fetches items from the stream and stores them in a buffer. In some embodiments, fetch parse circuitry iterate through items in the buffer using a fetch parse pointer to detect indirect-data-access items and/or redirect items in the buffer. The fetch parse circuitry may send detected indirect data accesses to indirect-fetch circuitry, which may buffer requests. In some embodiments, execute parse circuitry iterates through items in the buffer using an execute parse pointer (e.g., which may trail the fetch parse pointer) and outputs both item data from the buffer and indirect-fetch results from indirect-fetch circuitry for execution. In various embodiments, the disclosed techniques may reduce fetch latency for compute kernels.

Patent Agency Ranking