On-demand memory allocation
    71.
    发明授权

    公开(公告)号:US12265474B2

    公开(公告)日:2025-04-01

    申请号:US18490588

    申请日:2023-10-19

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.

    SIMD operand permutation with selection from among multiple registers

    公开(公告)号:US12008377B2

    公开(公告)日:2024-06-11

    申请号:US18299452

    申请日:2023-04-12

    Applicant: Apple Inc.

    CPC classification number: G06F9/3887 G06F9/30098 G06T1/20 G06T1/60

    Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.

    Cache footprint management
    73.
    发明授权

    公开(公告)号:US11947462B1

    公开(公告)日:2024-04-02

    申请号:US17653418

    申请日:2022-03-03

    Applicant: Apple Inc.

    CPC classification number: G06F12/0875 G06F2212/60

    Abstract: Techniques are disclosed relating to cache footprint management. In some embodiments, execution circuitry is configured to perform operations for instructions from multiple threads in parallel. Cache circuitry may store information operated on by threads executed by the execution circuitry. Scheduling circuitry may arbitrate among threads to schedule threads for execution by the execution circuitry. Tracking circuitry may determine one or more performance metrics for the cache circuitry. Control circuitry may, based on the one or more performance metrics meeting a threshold, reduce a limit on a number of threads considered for arbitration by the scheduling circuitry, to control a footprint of information stored by the cache circuitry. Disclosed techniques may advantageously reduce or avoid cache thrashing for certain processor workloads.

    Tiled processor communication fabric

    公开(公告)号:US11941742B2

    公开(公告)日:2024-03-26

    申请号:US17808392

    申请日:2022-06-23

    Applicant: Apple Inc.

    CPC classification number: G06T15/005 G06F9/3887

    Abstract: Techniques are disclosed relating to processor communications fabrics. In some embodiments, a processor includes multiple client circuitry and fabric circuitry that includes at least first and second instances of a tile. The tile may include: client inputs configured to interface with client circuits, tile inputs configured to interface with one or more other tile instances, and communication resources assignable to the client inputs and tile inputs. The communications resources may include: multiple internal links, client outputs configured to interface with client circuits, and tile outputs configured to interface with one or more other tile instances. Control circuitry may, in a given cycle, assign communication resources of a given tile instance to at least a portion of the client inputs and tile inputs for a next cycle, based on priority information. The control circuitry may update priority information based on assignment results over multiple cycles.

    SIMD operand permutation with selection from among multiple registers

    公开(公告)号:US11126439B2

    公开(公告)日:2021-09-21

    申请号:US16686060

    申请日:2019-11-15

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.

    On-demand Memory Allocation
    77.
    发明申请

    公开(公告)号:US20210271606A1

    公开(公告)日:2021-09-02

    申请号:US16804128

    申请日:2020-02-28

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.

    Datapath Circuitry for Math Operations using SIMD Pipelines

    公开(公告)号:US20210109761A1

    公开(公告)日:2021-04-15

    申请号:US16597625

    申请日:2019-10-09

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to sharing operands among SIMD threads for a larger arithmetic operation. In some embodiments, a set of multiple hardware pipelines is configured to execute single-instruction multiple-data (SIMD) instructions for multiple threads in parallel, where ones of the hardware pipelines include execution circuitry configured to perform floating-point operations using one or more pipeline stages of the pipeline and first routing circuitry configured to select, from among thread-specific operands stored for the hardware pipeline and from one or more other pipelines in the set, a first input operand for an operation by the execution circuitry. In some embodiments, a device is configured to perform a mathematical operation on source input data structures stored across thread-specific storage for the set of hardware pipelines, by executing multiple SIMD floating-point operations using the execution circuitry and the first routing circuitry. This may improve performance and reduce power consumption for matrix multiply and reduction operations, for example.

    Memory allocation techniques for graphics shader

    公开(公告)号:US10699368B1

    公开(公告)日:2020-06-30

    申请号:US15690954

    申请日:2017-08-30

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to memory allocation in a graphics shader. In some embodiments, a memory for storing input data for operations by the shader is shared for multiple different types of tasks (e.g., pixel shading tasks and compute tasks). In some embodiments, a graphics device is configured to separately process different portions (e.g., tiles) of a frame of graphics data. In some embodiments, the graphics device is configured to dynamically adjust the number of frame portions processed in parallel based on allocation information, where the allocation information is determined based on requests for other types of tasks. This may prevent pixel shading tasks from stalling other tasks for extended periods and may allow dynamic adjustments memory allocation mid-render.

    Pipelined allocation for operand cache

    公开(公告)号:US10678548B2

    公开(公告)日:2020-06-09

    申请号:US16112614

    申请日:2018-08-24

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to controlling an operand cache in a pipelined fashion. An operand cache may cache operands fetched from the register file or generated by previous instructions to improve performance and/or reduce power consumption. In some embodiments, instructions are pipelined and separate tag information is maintained to indicate allocation of an operand cache entry and ownership of the operand cache entry. In some embodiments, this may allow an operand to remain in the operand cache (and potentially be retrieved or modified) during an interval between allocation of the entry for another operand and ownership of the entry by the other operand. This may improve operand cache efficiency by allowing the entry to be used while to retrieving the other operand from the register file, for example.

Patent Agency Ranking