On-demand memory allocation
    51.
    发明授权

    公开(公告)号:US11829298B2

    公开(公告)日:2023-11-28

    申请号:US16804128

    申请日:2020-02-28

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.

    Private memory management using utility thread

    公开(公告)号:US11714759B2

    公开(公告)日:2023-08-01

    申请号:US16995450

    申请日:2020-08-17

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to private memory management using a mapping thread, which may be persistent. In some embodiments, a graphics processor is configured to generate a pool of private memory pages for a set of graphics work that includes multiple threads. The processor may maintain a translation table configured to map private memory addresses to virtual addresses based on identifiers of the threads. The processor may execute a mapping thread to receive a request to allocate a private memory page for a requesting thread, select a private memory page from the pool in response to the request, and map the selected page in the translation table for the requesting. The processor may then execute one or more instructions of the requesting thread to access a private memory space, wherein the execution includes translation of a private memory address to a virtual address based on the mapped page in the translation table. The mapping thread may be a persistent thread for which resources are allocated for an entirety of a time interval over which the set of graphics work is executed.

    Instruction-level context switch in SIMD processor

    公开(公告)号:US11360780B2

    公开(公告)日:2022-06-14

    申请号:US16749618

    申请日:2020-01-22

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to context switching in a SIMD processor. In some embodiments, an apparatus includes pipeline circuitry configured to execute graphics instructions included in threads of a group of single-instruction multiple-data (SIMD) threads in a thread group. In some embodiments, context switch circuitry is configured to atomically: save, for the SIMD group, a program counter and information that indicates whether threads in the SIMD group are active using one or more context switch registers, set all threads to an active state for the SIMD group, and branch to handler code for the SIMD group. In some embodiments, the pipeline circuitry is configured to execute the handler code to save context information for the SIMD group and subsequently execute threads of another thread group. Disclosed techniques may allow instruction-level context switching even when some SIMD threads are non-active.

    Graphics Memory Space for Shader Core

    公开(公告)号:US20220148249A1

    公开(公告)日:2022-05-12

    申请号:US17103462

    申请日:2020-11-24

    Applicant: Apple Inc.

    Abstract: Disclosed techniques relate to memory space management for graphics processing. In some embodiments, first and second graphics cores are configured to execute instructions for multiple threadgroups. In some embodiments, the threads groups include a first threadgroup with multiple single-instruction multiple-data (SIMD) groups configured to execute a first shader program and a second threadgroup with multiple SIMD groups configured to execute a second, different shader program. Control circuitry may be configured to provide access to data stored in memory circuitry according to a shader memory space. The shader memory space may be accessible to threadgroups executed by the first graphics shader core, including the first and second threadgroups, but is not accessible to threadgroups executed by the second graphics shader core. Disclosed techniques may reduce latency, increase bandwidth available to the shader, reduce coherency cost, or any combination thereof.

    Register file arbitration
    55.
    发明授权

    公开(公告)号:US11080055B2

    公开(公告)日:2021-08-03

    申请号:US16548797

    申请日:2019-08-22

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to arbitration among register file accesses. In some embodiments, an apparatus includes a register file configured to store operands for multiple client circuits and arbitration circuitry configured to select from among multiple received requests to access the register file. In some embodiments, the apparatus includes first interface circuitry configured to provide access requests from a first client circuit to the arbitration circuitry and supplemental interface circuitry configured to receive unsuccessful requests from the first client circuit and provide the received unsuccessful requests to the arbitration circuitry. The supplemental interface circuitry may provide additional catch-up bandwidth to clients that lose arbitration, which may result in fairness during bandwidth shortages.

    Instruction-level Context Switch in SIMD Processor

    公开(公告)号:US20210224072A1

    公开(公告)日:2021-07-22

    申请号:US16749618

    申请日:2020-01-22

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to context switching in a SIMD processor. In some embodiments, an apparatus includes pipeline circuitry configured to execute graphics instructions included in threads of a group of single-instruction multiple-data (SIMD) threads in a thread group. In some embodiments, context switch circuitry is configured to atomically: save, for the SIMD group, a program counter and information that indicates whether threads in the SIMD group are active using one or more context switch registers, set all threads to an active state for the SIMD group, and branch to handler code for the SIMD group. In some embodiments, the pipeline circuitry is configured to execute the handler code to save context information for the SIMD group and subsequently execute threads of another thread group. Disclosed techniques may allow instruction-level context switching even when some SIMD threads are non-active.

    SIMD Operand Permutation with Selection from among Multiple Registers

    公开(公告)号:US20210149679A1

    公开(公告)日:2021-05-20

    申请号:US16686060

    申请日:2019-11-15

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.

    Hardware resource allocation system for allocating resources to threads

    公开(公告)号:US10990445B2

    公开(公告)日:2021-04-27

    申请号:US15669445

    申请日:2017-08-04

    Applicant: Apple Inc.

    Abstract: In various embodiments, a resource allocation management circuit may allocate a plurality of different types of hardware resources (e.g., different types of registers) to a plurality of threads. The different types of hardware resources may correspond to a plurality of hardware resource allocation circuits. The resource allocation management circuit may track allocation of the hardware resources to the threads using state identification values of the threads. In response to determining that fewer than a respective requested number of one or more types of the hardware resources are available, the resource allocation management circuit may identify one or more threads for deallocation. As a result, the hardware resource allocation system may allocate hardware resources to threads more efficiently (e.g., may deallocate hardware resources allocated to fewer threads), as compared to a hardware resource allocation system that does not track allocation of hardware resources to threads using state identification values.

    Routing Circuitry for Permutation of Single-Instruction Multiple-Data Operands

    公开(公告)号:US20210055931A1

    公开(公告)日:2021-02-25

    申请号:US16548812

    申请日:2019-08-22

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to routing circuitry configured to perform permute operations for operands of threads in a single-instruction multiple-data group. In some embodiments, an apparatus includes hierarchical operand routing circuitry configured to route operands between a set of single-instruction multiple-data (SIMD) pipelines based on a permute instruction. In some embodiments, the routing circuitry includes a first level and a second level. The first level may include a set of multiple crossbar circuits each configured to receive operands from a respective subset of the pipelines and output one or more of the received operands on multiple output lines based on the permute instruction, where the crossbar circuits support full permutation within a respective subset. A second level may be configured to select an operand from a previous level for each of the pipelines, and may select from among only a portion of output operands from the previous level to provide an operand for a respective pipeline.

    Register File Arbitration
    60.
    发明申请

    公开(公告)号:US20210055929A1

    公开(公告)日:2021-02-25

    申请号:US16548797

    申请日:2019-08-22

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to arbitration among register file accesses. In some embodiments, an apparatus includes a register file configured to store operands for multiple client circuits and arbitration circuitry configured to select from among multiple received requests to access the register file. In some embodiments, the apparatus includes first interface circuitry configured to provide access requests from a first client circuit to the arbitration circuitry and supplemental interface circuitry configured to receive unsuccessful requests from the first client circuit and provide the received unsuccessful requests to the arbitration circuitry. The supplemental interface circuitry may provide additional catch-up bandwidth to clients that lose arbitration, which may result in fairness during bandwidth shortages.

Patent Agency Ranking