Hints for shared store pipeline and multi-rate targets

    公开(公告)号:US10452401B2

    公开(公告)日:2019-10-22

    申请号:US15463511

    申请日:2017-03-20

    Applicant: Apple Inc.

    Inventor: Robert D. Kenney

    Abstract: Techniques are disclosed relating to selecting store instructions for dispatch to a shared pipeline. In some embodiments, the shared pipeline processes instructions for different target clients with different data rate capabilities. Therefore, in some embodiments, the pipeline is configured to generate state information that is based on a determined amount of work in the pipeline that targets at least one slower target. In some embodiments, the state information indicates whether the amount of work is above a threshold for the particular target. In some embodiments, scheduling circuitry is configured to select instructions for dispatch to the pipeline based on the state information. For example, the scheduling circuitry may refrain from selecting instructions with a slower target when the slower target is above its threshold amount of work in the pipeline. In some embodiments, the shared pipeline is a store pipeline configured to execute store instructions that target memories with different data rate capabilities.

    SIMD Operand Permutation with Selection from among Multiple Registers

    公开(公告)号:US20230325196A1

    公开(公告)日:2023-10-12

    申请号:US18299452

    申请日:2023-04-12

    Applicant: Apple Inc.

    CPC classification number: G06F9/3887 G06T1/60 G06T1/20 G06F9/30098

    Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.

    Datapath circuitry for math operations using SIMD pipelines

    公开(公告)号:US11256518B2

    公开(公告)日:2022-02-22

    申请号:US16597625

    申请日:2019-10-09

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to sharing operands among SIMD threads for a larger arithmetic operation. In some embodiments, a set of multiple hardware pipelines is configured to execute single-instruction multiple-data (SIMD) instructions for multiple threads in parallel, where ones of the hardware pipelines include execution circuitry configured to perform floating-point operations using one or more pipeline stages of the pipeline and first routing circuitry configured to select, from among thread-specific operands stored for the hardware pipeline and from one or more other pipelines in the set, a first input operand for an operation by the execution circuitry. In some embodiments, a device is configured to perform a mathematical operation on source input data structures stored across thread-specific storage for the set of hardware pipelines, by executing multiple SIMD floating-point operations using the execution circuitry and the first routing circuitry. This may improve performance and reduce power consumption for matrix multiply and reduction operations, for example.

    SIMD Operand Permutation with Selection from among Multiple Registers

    公开(公告)号:US20210406031A1

    公开(公告)日:2021-12-30

    申请号:US17470682

    申请日:2021-09-09

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.

    Multi-channel Data Path Circuitry
    25.
    发明申请

    公开(公告)号:US20210349725A1

    公开(公告)日:2021-11-11

    申请号:US16870330

    申请日:2020-05-08

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to sharing datapath circuitry among multiple SIMD groups. In some embodiments, pipeline circuitry is configured to perform operations specified by instructions of first and second assigned SIMD groups. The pipeline circuitry may include first and second front-end circuitry configured to decode instructions of the respective SIMD groups. The pipeline circuitry may include shared execution circuitry configured to perform operations specified by the first and second assigned SIMD groups and arbitration circuitry configured to select an instruction from among at least the first and second front-end circuitry for assignment to the shared execution circuitry in a current cycle. The arbitration circuitry may select an instruction based on one or more of: stall counts, whether available instructions are being speculatively executed, whether ones of available instructions target a particular portion of the shared execution circuitry, numbers of execution cycles, and SIMD group ages.

    GPU task scheduling
    26.
    发明授权

    公开(公告)号:US10902545B2

    公开(公告)日:2021-01-26

    申请号:US14574041

    申请日:2014-12-17

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to scheduling tasks for graphics processing. In one embodiment, a graphics unit is configured to render a frame of graphics data using a plurality of pass groups and the frame of graphics data includes a plurality of frame portions. In this embodiment, the graphics unit includes scheduling circuitry configured to receive a plurality of tasks, maintain pass group information for each of the plurality of tasks, and maintain relative age information for the plurality of frame portions. In this embodiment, the scheduling circuitry is configured to select a task for execution based on the pass group information and the age information. In some embodiments, the scheduling circuitry is configured to select tasks from an oldest frame portion and current pass group before selecting other tasks. This scheduling approach may result in efficient execution of various different types of graphics workloads.

    Techniques for ALU sharing between threads

    公开(公告)号:US10699366B1

    公开(公告)日:2020-06-30

    申请号:US16057794

    申请日:2018-08-07

    Applicant: Apple Inc.

    Inventor: Robert D. Kenney

    Abstract: Techniques are disclosed relating to sharing an arithmetic logic unit (ALU) between multiple threads. In some embodiments, the threads also have dedicated ALUs for other types of operations. In some embodiments, arbitration circuitry is configured to receive operations to be performed by the shared arithmetic logic unit from the set of threads and issue the received operations to the shared arithmetic logic unit. In some embodiments, the arbitration circuitry is configured to switch to a different one of the set of threads for each instruction issued to the shared arithmetic logic unit. In some embodiments, the shared ALU is configured to perform 32-bit operations and the dedicated ALUs are configured to perform the same operations using 16-bit precision. In some embodiments, the shared ALU is shared between two threads and is physically located adjacent to other datapath circuitry for the two threads.

    Power saving with dynamic pulse insertion

    公开(公告)号:US10270434B2

    公开(公告)日:2019-04-23

    申请号:US15046926

    申请日:2016-02-18

    Applicant: Apple Inc.

    Abstract: A method and apparatus for saving power in integrated circuits is disclosed. An IC includes functional circuit blocks which are not placed into a sleep mode when idle. A power management circuit may monitor the activity levels of the functional circuit blocks not placed into a sleep mode. When the power management circuit detects that an activity level of one of the non-sleep functional circuit blocks is less than a predefined threshold, it reduce the frequency of a clock signal provided thereto by scheduling only one pulse of a clock signal for every N pulses of the full frequency clock signal. The remaining N−1 pulses of the clock signal may be inhibited. If a high priority transaction inbound for the functional circuit block is detected, an inserted pulse of the clock signal may be provided to the functional unit irrespective of when a most recent regular pulse was provided.

    DATA ALIGNMENT AND FORMATTING FOR GRAPHICS PROCESSING UNIT
    29.
    发明申请
    DATA ALIGNMENT AND FORMATTING FOR GRAPHICS PROCESSING UNIT 审中-公开
    图形处理单元的数据对齐和格式化

    公开(公告)号:US20160093014A1

    公开(公告)日:2016-03-31

    申请号:US14496934

    申请日:2014-09-25

    Applicant: Apple Inc.

    Abstract: A data queuing and format apparatus is disclosed. A first selection circuit may be configured to selectively couple a first subset of data to a first plurality of data lines dependent upon control information, and a second selection circuit may be configured to selectively couple a second subset of data to a second plurality of data lines dependent upon the control information. A storage array may include multiple storage units, and each storage unit may be configured to receive data from one or more data lines of either the first or second plurality of data lines dependent upon the control information.

    Abstract translation: 公开了一种数据排队和格式化装置。 第一选择电路可以被配置为选择性地将数据的第一子集耦合到取决于控制信息的第一多个数据线,并且第二选择电路可以被配置为选择性地将第二数据子集耦合到第二多个数据线 取决于控制信息。 存储阵列可以包括多个存储单元,并且每个存储单元可以被配置为根据控制信息从第一或第二多个数据线的一个或多个数据线接收数据。

Patent Agency Ranking