Apparatus implementing instructions that impose pipeline interdependencies
    11.
    发明授权
    Apparatus implementing instructions that impose pipeline interdependencies 有权
    实施施加管道相互依赖性的指令的装置

    公开(公告)号:US09183611B2

    公开(公告)日:2015-11-10

    申请号:US13935299

    申请日:2013-07-03

    Applicant: Apple Inc.

    CPC classification number: G06T1/20 G06F9/3838 G06F9/3851 G06F9/3867 G06F9/3885

    Abstract: Techniques are disclosed relating to implementation of gradient-type graphics instructions. In one embodiment, an apparatus includes first and second execution pipelines and a register file. In this embodiment, the register file is coupled to the first and second execution pipelines and configured to store operands for the first and second execution pipelines. In this embodiment, the apparatus is configured to determine that a graphics instruction imposes a dependency between the first and second pipeline. In response to this determination, the apparatus is configured to read a plurality of operands from the register file including an operand assigned to the second execution pipeline and to select the operand assigned to the second execution pipeline as an input operand for the first execution pipeline. The apparatus may be configured such that operands assigned to the second execution pipeline are accessible by the first execution pipeline only via the register file and not from other locations.

    Abstract translation: 公开了与梯度型图形指令的实现有关的技术。 在一个实施例中,装置包括第一和第二执行流水线和寄存器文件。 在该实施例中,寄存器文件耦合到第一和第二执行流水线并且被配置为存储用于第一和第二执行流水线的操作数。 在该实施例中,该装置被配置为确定图形指令施加第一和第二流水线之间的依赖关系。 响应于该确定,该装置被配置为从寄存器文件读取包括分配给第二执行流水线的操作数的多个操作数,并且将分配给第二执行流水线的操作数作为第一执行流水线的输入操作数进行选择。 该装置可以被配置为使得分配给第二执行流水线的操作数仅由第一执行流水线仅通过寄存器文件而不是来自其他位置。

    Multi-stage thread scheduling
    12.
    发明授权

    公开(公告)号:US12190151B2

    公开(公告)日:2025-01-07

    申请号:US18054376

    申请日:2022-11-10

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to multi-stage thread scheduling. In some embodiments, processor circuitry includes multiple channel pipelines for multiple channels and multiple execution pipelines shared by the channel pipelines and configured to perform different types of operations provided by the channel pipelines. First scheduler circuitry may arbitrate among threads to assign threads to channels. Second scheduler circuitry may arbitrate among channels to assign an operation from a given channel to a given execution pipeline. The execution pipelines may provide backpressure information to the first scheduler circuitry based on execution status and the first scheduler circuitry may adjust priority of a thread for assignment to a channel based on the backpressure information. Disclosed techniques may reduce channel conflicts and starvation for execution resources.

    Routing circuitry for permutation of single-instruction multiple-data operands

    公开(公告)号:US11294672B2

    公开(公告)日:2022-04-05

    申请号:US16548812

    申请日:2019-08-22

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to routing circuitry configured to perform permute operations for operands of threads in a single-instruction multiple-data group. In some embodiments, an apparatus includes hierarchical operand routing circuitry configured to route operands between a set of single-instruction multiple-data (SIMD) pipelines based on a permute instruction. In some embodiments, the routing circuitry includes a first level and a second level. The first level may include a set of multiple crossbar circuits each configured to receive operands from a respective subset of the pipelines and output one or more of the received operands on multiple output lines based on the permute instruction, where the crossbar circuits support full permutation within a respective subset. A second level may be configured to select an operand from a previous level for each of the pipelines, and may select from among only a portion of output operands from the previous level to provide an operand for a respective pipeline.

    Thread-group-scoped gate instruction

    公开(公告)号:US11204774B1

    公开(公告)日:2021-12-21

    申请号:US17008518

    申请日:2020-08-31

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to a thread-group-scoped gate instruction. In some embodiments, graphics processor circuitry is configured to execute, for multiple SIMD groups of a thread group, a graphics program that includes a gate instruction. During execution of the gate instruction for a first SIMD group, the processor accesses state information to determine that a threshold number of other SIMD groups in the thread group have not yet executed the gate instruction. Based on the determination, the processor executes a particular set of instructions of the graphics program for the first SIMD group (that is not executed by one or more other SIMD groups that reach the gate instruction after the first SIMD group). For example, the particular set of instructions may be a utility program that performs one or more operations for the entire thread group but is only executed by a subset of the SIMD groups.

    Hardware Resource Allocation System for Allocating Resources to Threads

    公开(公告)号:US20210248006A1

    公开(公告)日:2021-08-12

    申请号:US17240406

    申请日:2021-04-26

    Applicant: Apple Inc.

    Abstract: In various embodiments, a resource allocation management circuit may allocate a plurality of different types of hardware resources (e.g., different types of registers) to a plurality of threads. The different types of hardware resources may correspond to a plurality of hardware resource allocation circuits. The resource allocation management circuit may track allocation of the hardware resources to the threads using state identification values of the threads. In response to determining that fewer than a respective requested number of one or more types of the hardware resources are available, the resource allocation management circuit may identify one or more threads for deallocation. As a result, the hardware resource allocation system may allocate hardware resources to threads more efficiently (e.g., may deallocate hardware resources allocated to fewer threads), as compared to a hardware resource allocation system that does not track allocation of hardware resources to threads using state identification values.

    Graphics hardware driven pause for quality of service adjustment

    公开(公告)号:US10795730B2

    公开(公告)日:2020-10-06

    申请号:US16145573

    申请日:2018-09-28

    Applicant: Apple Inc.

    Abstract: In general, embodiments are disclosed for tracking and allocating graphics processor hardware resources. More particularly, a graphics hardware resource allocation system is able to generate a priority list for a plurality of data masters for graphics processor based on a comparison between a current utilizations for the data masters and a target utilizations for the data masters. The graphics hardware resource allocation system designate, based on the priority list, a first data master with a higher priority to submit work to the graphics processor compared to a second data master. The graphics hardware resource allocation system determines a stall counter value for the data master and generates a notification to pause work for the second data master based on the stall counter value.

    Data alignment and formatting for graphics processing unit

    公开(公告)号:US10769746B2

    公开(公告)日:2020-09-08

    申请号:US14496934

    申请日:2014-09-25

    Applicant: Apple Inc.

    Abstract: A data queuing and format apparatus is disclosed. A first selection circuit may be configured to selectively couple a first subset of data to a first plurality of data lines dependent upon control information, and a second selection circuit may be configured to selectively couple a second subset of data to a second plurality of data lines dependent upon the control information. A storage array may include multiple storage units, and each storage unit may be configured to receive data from one or more data lines of either the first or second plurality of data lines dependent upon the control information.

    Resource Synchronization for Graphics Processing

    公开(公告)号:US20180182154A1

    公开(公告)日:2018-06-28

    申请号:US15388985

    申请日:2016-12-22

    Applicant: Apple Inc.

    CPC classification number: G06T15/005

    Abstract: Techniques are disclosed relating to synchronizing access to pixel resources. Examples of pixel resources include color attachments, a stencil buffer, and a depth buffer. In some embodiments, hardware registers are used to track status of assigned pixel resources and pixel wait and pixel release instruction are used to synchronize access to the pixel resources. In some embodiments, other accesses to the pixel resources may occur out of program order. Relative to tracking and ordering pass groups, this weak ordering and explicit synchronization may improve performance and reduce power consumption. Disclosed techniques may also facilitate coordination between fragment rendering threads and auxiliary mid-render compute tasks.

Patent Agency Ranking