-
公开(公告)号:US10452401B2
公开(公告)日:2019-10-22
申请号:US15463511
申请日:2017-03-20
Applicant: Apple Inc.
Inventor: Robert D. Kenney
Abstract: Techniques are disclosed relating to selecting store instructions for dispatch to a shared pipeline. In some embodiments, the shared pipeline processes instructions for different target clients with different data rate capabilities. Therefore, in some embodiments, the pipeline is configured to generate state information that is based on a determined amount of work in the pipeline that targets at least one slower target. In some embodiments, the state information indicates whether the amount of work is above a threshold for the particular target. In some embodiments, scheduling circuitry is configured to select instructions for dispatch to the pipeline based on the state information. For example, the scheduling circuitry may refrain from selecting instructions with a slower target when the slower target is above its threshold amount of work in the pipeline. In some embodiments, the shared pipeline is a store pipeline configured to execute store instructions that target memories with different data rate capabilities.
-
公开(公告)号:US20230325196A1
公开(公告)日:2023-10-12
申请号:US18299452
申请日:2023-04-12
Applicant: Apple Inc.
Inventor: Christopher A. Burns , Liang-Kai Wang , Robert D. Kenney , Terence M. Potter
CPC classification number: G06F9/3887 , G06T1/60 , G06T1/20 , G06F9/30098
Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.
-
公开(公告)号:US11256518B2
公开(公告)日:2022-02-22
申请号:US16597625
申请日:2019-10-09
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Robert D. Kenney , Terence M. Potter , Vinod Reddy Nalamalapu , Sivayya V. Ayinala
Abstract: Techniques are disclosed relating to sharing operands among SIMD threads for a larger arithmetic operation. In some embodiments, a set of multiple hardware pipelines is configured to execute single-instruction multiple-data (SIMD) instructions for multiple threads in parallel, where ones of the hardware pipelines include execution circuitry configured to perform floating-point operations using one or more pipeline stages of the pipeline and first routing circuitry configured to select, from among thread-specific operands stored for the hardware pipeline and from one or more other pipelines in the set, a first input operand for an operation by the execution circuitry. In some embodiments, a device is configured to perform a mathematical operation on source input data structures stored across thread-specific storage for the set of hardware pipelines, by executing multiple SIMD floating-point operations using the execution circuitry and the first routing circuitry. This may improve performance and reduce power consumption for matrix multiply and reduction operations, for example.
-
公开(公告)号:US20210406031A1
公开(公告)日:2021-12-30
申请号:US17470682
申请日:2021-09-09
Applicant: Apple Inc.
Inventor: Christopher A. Burns , Liang-Kai Wang , Robert D. Kenney , Terence M. Potter
Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.
-
公开(公告)号:US20210349725A1
公开(公告)日:2021-11-11
申请号:US16870330
申请日:2020-05-08
Applicant: Apple Inc.
Inventor: Robert D. Kenney , Jason N. Dale
Abstract: Techniques are disclosed relating to sharing datapath circuitry among multiple SIMD groups. In some embodiments, pipeline circuitry is configured to perform operations specified by instructions of first and second assigned SIMD groups. The pipeline circuitry may include first and second front-end circuitry configured to decode instructions of the respective SIMD groups. The pipeline circuitry may include shared execution circuitry configured to perform operations specified by the first and second assigned SIMD groups and arbitration circuitry configured to select an instruction from among at least the first and second front-end circuitry for assignment to the shared execution circuitry in a current cycle. The arbitration circuitry may select an instruction based on one or more of: stall counts, whether available instructions are being speculatively executed, whether ones of available instructions target a particular portion of the shared execution circuitry, numbers of execution cycles, and SIMD group ages.
-
公开(公告)号:US10902545B2
公开(公告)日:2021-01-26
申请号:US14574041
申请日:2014-12-17
Applicant: Apple Inc.
Inventor: Robert D. Kenney , Benjiman L. Goodman , Terence M. Potter
Abstract: Techniques are disclosed relating to scheduling tasks for graphics processing. In one embodiment, a graphics unit is configured to render a frame of graphics data using a plurality of pass groups and the frame of graphics data includes a plurality of frame portions. In this embodiment, the graphics unit includes scheduling circuitry configured to receive a plurality of tasks, maintain pass group information for each of the plurality of tasks, and maintain relative age information for the plurality of frame portions. In this embodiment, the scheduling circuitry is configured to select a task for execution based on the pass group information and the age information. In some embodiments, the scheduling circuitry is configured to select tasks from an oldest frame portion and current pass group before selecting other tasks. This scheduling approach may result in efficient execution of various different types of graphics workloads.
-
公开(公告)号:US10699366B1
公开(公告)日:2020-06-30
申请号:US16057794
申请日:2018-08-07
Applicant: Apple Inc.
Inventor: Robert D. Kenney
Abstract: Techniques are disclosed relating to sharing an arithmetic logic unit (ALU) between multiple threads. In some embodiments, the threads also have dedicated ALUs for other types of operations. In some embodiments, arbitration circuitry is configured to receive operations to be performed by the shared arithmetic logic unit from the set of threads and issue the received operations to the shared arithmetic logic unit. In some embodiments, the arbitration circuitry is configured to switch to a different one of the set of threads for each instruction issued to the shared arithmetic logic unit. In some embodiments, the shared ALU is configured to perform 32-bit operations and the dedicated ALUs are configured to perform the same operations using 16-bit precision. In some embodiments, the shared ALU is shared between two threads and is physically located adjacent to other datapath circuitry for the two threads.
-
公开(公告)号:US10270434B2
公开(公告)日:2019-04-23
申请号:US15046926
申请日:2016-02-18
Applicant: Apple Inc.
Inventor: James Wang , Benjiman L. Goodman , Liang-Kai Wang , Robert D. Kenney
IPC: H03K3/00 , H03K5/00 , G06F1/3228 , G06F1/324 , G06F1/329
Abstract: A method and apparatus for saving power in integrated circuits is disclosed. An IC includes functional circuit blocks which are not placed into a sleep mode when idle. A power management circuit may monitor the activity levels of the functional circuit blocks not placed into a sleep mode. When the power management circuit detects that an activity level of one of the non-sleep functional circuit blocks is less than a predefined threshold, it reduce the frequency of a clock signal provided thereto by scheduling only one pulse of a clock signal for every N pulses of the full frequency clock signal. The remaining N−1 pulses of the clock signal may be inhibited. If a high priority transaction inbound for the functional circuit block is detected, an inserted pulse of the clock signal may be provided to the functional unit irrespective of when a most recent regular pulse was provided.
-
29.
公开(公告)号:US20160093014A1
公开(公告)日:2016-03-31
申请号:US14496934
申请日:2014-09-25
Applicant: Apple Inc.
Inventor: Liang Xia , Robert D. Kenney , Benjiman L. Goodman , Terence M. Potter
Abstract: A data queuing and format apparatus is disclosed. A first selection circuit may be configured to selectively couple a first subset of data to a first plurality of data lines dependent upon control information, and a second selection circuit may be configured to selectively couple a second subset of data to a second plurality of data lines dependent upon the control information. A storage array may include multiple storage units, and each storage unit may be configured to receive data from one or more data lines of either the first or second plurality of data lines dependent upon the control information.
Abstract translation: 公开了一种数据排队和格式化装置。 第一选择电路可以被配置为选择性地将数据的第一子集耦合到取决于控制信息的第一多个数据线,并且第二选择电路可以被配置为选择性地将第二数据子集耦合到第二多个数据线 取决于控制信息。 存储阵列可以包括多个存储单元,并且每个存储单元可以被配置为根据控制信息从第一或第二多个数据线的一个或多个数据线接收数据。
-
-
-
-
-
-
-
-