SIMD operand permutation with selection from among multiple registers

    公开(公告)号:US11645084B2

    公开(公告)日:2023-05-09

    申请号:US17470682

    申请日:2021-09-09

    Applicant: Apple Inc.

    CPC classification number: G06F9/3887 G06F9/30098 G06T1/20 G06T1/60

    Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.

    Hints for Shared Store Pipeline and Multi-Rate Targets

    公开(公告)号:US20180267804A1

    公开(公告)日:2018-09-20

    申请号:US15463511

    申请日:2017-03-20

    Applicant: Apple Inc.

    Inventor: Robert D. Kenney

    Abstract: Techniques are disclosed relating to selecting store instructions for dispatch to a shared pipeline. In some embodiments, the shared pipeline processes instructions for different target clients with different data rate capabilities. Therefore, in some embodiments, the pipeline is configured to generate state information that is based on a determined amount of work in the pipeline that targets at least one slower target. In some embodiments, the state information indicates whether the amount of work is above a threshold for the particular target. In some embodiments, scheduling circuitry is configured to select instructions for dispatch to the pipeline based on the state information. For example, the scheduling circuitry may refrain from selecting instructions with a slower target when the slower target is above its threshold amount of work in the pipeline. In some embodiments, the shared pipeline is a store pipeline configured to execute store instructions that target memories with different data rate capabilities.

    Power Saving with Dynamic Pulse Insertion
    3.
    发明申请

    公开(公告)号:US20170244391A1

    公开(公告)日:2017-08-24

    申请号:US15046926

    申请日:2016-02-18

    Applicant: Apple Inc.

    Abstract: A method and apparatus for saving power in integrated circuits is disclosed. An IC includes functional circuit blocks which are not placed into a sleep mode when idle. A power management circuit may monitor the activity levels of the functional circuit blocks not placed into a sleep mode. When the power management circuit detects that an activity level of one of the non-sleep functional circuit blocks is less than a predefined threshold, it reduce the frequency of a clock signal provided thereto by scheduling only one pulse of a clock signal for every N pulses of the full frequency clock signal. The remaining N−1 pulses of the clock signal may be inhibited. If a high priority transaction inbound for the functional circuit block is detected, an inserted pulse of the clock signal may be provided to the functional unit irrespective of when a most recent regular pulse was provided.

    PESSIMISTIC DEPENDENCY HANDLING
    4.
    发明申请
    PESSIMISTIC DEPENDENCY HANDLING 审中-公开
    缓解依从性

    公开(公告)号:US20160246598A1

    公开(公告)日:2016-08-25

    申请号:US14629464

    申请日:2015-02-23

    Applicant: Apple Inc.

    CPC classification number: G06F9/3838 G06F9/3834

    Abstract: Techniques are disclosed relating to handling dependencies between instructions. In one embodiment, an apparatus includes decode circuitry and dependency circuitry. In this embodiment, the decode circuitry is configured to receive and instruction that specifies a destination location and determine a first storage region that includes the destination location. In this embodiment, the storage region is one of a plurality of different storage regions accessible by instructions processed by the apparatus. In this embodiment, the dependency circuitry is configured to stall the instruction until one or more older instructions that specify source locations in the first storage region have read their source locations. The disclosed techniques may be described as “pessimistic” dependency handling, which may, in some instances, maintain performance while limiting complexity, power consumption, and area of dependency logic.

    Abstract translation: 公开了涉及处理指令之间依赖性的技术。 在一个实施例中,一种装置包括解码电路和相关电路。 在该实施例中,解码电路被配置为接收和指示其指定目的地位置并确定包括目的地位置的第一存储区域。 在该实施例中,存储区域是由该设备处理的指令可访问的多个不同存储区域之一。 在该实施例中,依赖电路被配置为停止该指令,直到指定第一存储区域中的源位置的一个或多个旧指令已经读取其源位置。 所公开的技术可以被描述为“悲观”依赖关系处理,在一些情况下,可以在限制复杂性,功耗和依赖性逻辑区域的同时保持性能。

    Routing circuitry for permutation of single-instruction multiple-data operands

    公开(公告)号:US11294672B2

    公开(公告)日:2022-04-05

    申请号:US16548812

    申请日:2019-08-22

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to routing circuitry configured to perform permute operations for operands of threads in a single-instruction multiple-data group. In some embodiments, an apparatus includes hierarchical operand routing circuitry configured to route operands between a set of single-instruction multiple-data (SIMD) pipelines based on a permute instruction. In some embodiments, the routing circuitry includes a first level and a second level. The first level may include a set of multiple crossbar circuits each configured to receive operands from a respective subset of the pipelines and output one or more of the received operands on multiple output lines based on the permute instruction, where the crossbar circuits support full permutation within a respective subset. A second level may be configured to select an operand from a previous level for each of the pipelines, and may select from among only a portion of output operands from the previous level to provide an operand for a respective pipeline.

    Data alignment and formatting for graphics processing unit

    公开(公告)号:US10769746B2

    公开(公告)日:2020-09-08

    申请号:US14496934

    申请日:2014-09-25

    Applicant: Apple Inc.

    Abstract: A data queuing and format apparatus is disclosed. A first selection circuit may be configured to selectively couple a first subset of data to a first plurality of data lines dependent upon control information, and a second selection circuit may be configured to selectively couple a second subset of data to a second plurality of data lines dependent upon the control information. A storage array may include multiple storage units, and each storage unit may be configured to receive data from one or more data lines of either the first or second plurality of data lines dependent upon the control information.

    Pessimistic dependency handling based on storage regions

    公开(公告)号:US10114650B2

    公开(公告)日:2018-10-30

    申请号:US14629464

    申请日:2015-02-23

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to handling dependencies between instructions. In one embodiment, an apparatus includes decode circuitry and dependency circuitry. In this embodiment, the decode circuitry is configured to receive an instruction that specifies a destination location and determine a first storage region that includes the destination location. In this embodiment, the storage region is one of a plurality of different storage regions accessible by instructions processed by the apparatus. In this embodiment, the dependency circuitry is configured to stall the instruction until one or more older instructions that specify source locations in the first storage region have read their source locations. The disclosed techniques may be described as “pessimistic” dependency handling, which may, in some instances, maintain performance while limiting complexity, power consumption, and area of dependency logic.

    Pipeline dependency resolution
    8.
    发明授权
    Pipeline dependency resolution 有权
    管道依赖解决

    公开(公告)号:US09519944B2

    公开(公告)日:2016-12-13

    申请号:US14475119

    申请日:2014-09-02

    Applicant: Apple Inc.

    CPC classification number: G06T1/20 G06F9/30 G06F9/3838 G06F9/3877

    Abstract: Techniques are disclosed relating to dependency resolution among processor pipelines. In one embodiment, an apparatus includes a first special-purpose pipeline configured to execute, in parallel, a first type of graphics instruction for a group of graphics elements and a second special-purpose pipeline configured to execute, in parallel, a second type of graphics instruction for the group of graphics elements. In this embodiment, the apparatus is configured, in response to dispatch of an instruction of the second type, to mark a particular instruction of the first type with information indicative of the dispatched instruction. In this embodiment, the particular instruction and the dispatched instruction correspond to the same group of graphics elements. In this embodiment, the apparatus is configured to stall performance of the dispatched instruction until the first special-purpose pipeline has completed execution of the marked particular instruction. Exemplary instruction types include interpolate and sample instructions.

    Abstract translation: 公开了与处理器管线之间的依赖关系分辨有关的技术。 在一个实施例中,一种装置包括第一专用流水线,其被配置为并行地执行用于一组图形元件的第一类型的图形指令和被配置为并行地执行第二类型的图形元素的第二专用流水线 用于图形元素组的图形指令。 在该实施例中,该装置被配置为响应于第二类型的指令的分派,用指示发送指令的信息来标记第一类型的特定指令。 在本实施例中,特定指令和分派指令对应于同一组图形元素。 在本实施例中,该装置被配置为停止分派指令的性能,直到第一专用流水线已经完成了标记的特定指令的执行。 示例性指令类型包括内插和样本指令。

    PIPELINE DEPENDENCY RESOLUTION
    9.
    发明申请
    PIPELINE DEPENDENCY RESOLUTION 有权
    管道依赖决议

    公开(公告)号:US20160063662A1

    公开(公告)日:2016-03-03

    申请号:US14475119

    申请日:2014-09-02

    Applicant: Apple Inc.

    CPC classification number: G06T1/20 G06F9/30 G06F9/3838 G06F9/3877

    Abstract: Techniques are disclosed relating to dependency resolution among processor pipelines. In one embodiment, an apparatus includes a first special-purpose pipeline configured to execute, in parallel, a first type of graphics instruction for a group of graphics elements and a second special-purpose pipeline configured to execute, in parallel, a second type of graphics instruction for the group of graphics elements. In this embodiment, the apparatus is configured, in response to dispatch of an instruction of the second type, to mark a particular instruction of the first type with information indicative of the dispatched instruction. In this embodiment, the particular instruction and the dispatched instruction correspond to the same group of graphics elements. In this embodiment, the apparatus is configured to stall performance of the dispatched instruction until the first special-purpose pipeline has completed execution of the marked particular instruction. Exemplary instruction types include interpolate and sample instructions.

    Abstract translation: 公开了与处理器管线之间的依赖关系分辨有关的技术。 在一个实施例中,一种装置包括第一专用流水线,其被配置为并行地执行用于一组图形元件的第一类型的图形指令和被配置为并行地执行第二类型的图形元素的第二专用流水线 用于图形元素组的图形指令。 在该实施例中,该装置被配置为响应于第二类型的指令的分派,用指示发送指令的信息来标记第一类型的特定指令。 在本实施例中,特定指令和分派指令对应于同一组图形元素。 在本实施例中,该装置被配置为停止分派指令的性能,直到第一专用流水线已经完成了标记的特定指令的执行。 示例性指令类型包括内插和样本指令。

    GPU TASK SCHEDULING
    10.
    发明申请
    GPU TASK SCHEDULING 审中-公开
    GPU任务调度

    公开(公告)号:US20160055610A1

    公开(公告)日:2016-02-25

    申请号:US14574041

    申请日:2014-12-17

    Applicant: Apple Inc.

    CPC classification number: G06T1/20 G06F3/14 G09G5/001 G09G5/363

    Abstract: Techniques are disclosed relating to scheduling tasks for graphics processing. In one embodiment, a graphics unit is configured to render a frame of graphics data using a plurality of pass groups and the frame of graphics data includes a plurality of frame portions. In this embodiment, the graphics unit includes scheduling circuitry configured to receive a plurality of tasks, maintain pass group information for each of the plurality of tasks, and maintain relative age information for the plurality of frame portions. In this embodiment, the scheduling circuitry is configured to select a task for execution based on the pass group information and the age information. In some embodiments, the scheduling circuitry is configured to select tasks from an oldest frame portion and current pass group before selecting other tasks. This scheduling approach may result in efficient execution of various different types of graphics workloads.

    Abstract translation: 公开了关于用于图形处理的调度任务的技术。 在一个实施例中,图形单元被配置为使用多个通行组来渲染图形数据帧,并且图形数据帧包括多个帧部分。 在该实施例中,图形单元包括配置成接收多个任务的调度电路,维护多个任务中的每个任务的通行组信息,并维护多个帧部分的相对年龄信息。 在该实施例中,调度电路被配置为基于通过组信息和年龄信息来选择用于执行的任务。 在一些实施例中,调度电路被配置为在选择其他任务之前从最旧的帧部分和当前的传递组中选择任务。 这种调度方法可以导致各种不同类型的图形工作负载的有效执行。

Patent Agency Ranking