Cache control to preserve register data

    公开(公告)号:US12182037B2

    公开(公告)日:2024-12-31

    申请号:US18173500

    申请日:2023-02-23

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to eviction control for cache lines that store register data. In some embodiments, memory hierarchy circuitry is configured to provide memory backing for register operand data in one or more cache circuits. Lock circuitry may control a first set of lock indicators for a set of registers for a first thread, including to assert one or more lock indicators for registers that are indicated, by decode circuitry, as being utilized by decoded instructions of the first thread. The lock circuitry may preserve register operand data in the one or more cache circuits, including to prevent eviction of a given cache line from a cache circuit based on an asserted lock indicator. The lock circuitry may clear the first set of lock indicators in response to a reset event. Disclosed techniques may advantageously retain relevant register information in the cache with limited control circuit area.

    Multi-stage Thread Scheduling
    2.
    发明公开

    公开(公告)号:US20240095065A1

    公开(公告)日:2024-03-21

    申请号:US18054376

    申请日:2022-11-10

    Applicant: Apple Inc.

    CPC classification number: G06F9/4881 G06F9/485

    Abstract: Techniques are disclosed relating to multi-stage thread scheduling. In some embodiments, processor circuitry includes multiple channel pipelines for multiple channels and multiple execution pipelines shared by the channel pipelines and configured to perform different types of operations provided by the channel pipelines. First scheduler circuitry may arbitrate among threads to assign threads to channels. Second scheduler circuitry may arbitrate among channels to assign an operation from a given channel to a given execution pipeline. The execution pipelines may provide backpressure information to the first scheduler circuitry based on execution status and the first scheduler circuitry may adjust priority of a thread for assignment to a channel based on the backpressure information. Disclosed techniques may reduce channel conflicts and starvation for execution resources.

    Graphics memory space for shader core

    公开(公告)号:US11521343B2

    公开(公告)日:2022-12-06

    申请号:US17103462

    申请日:2020-11-24

    Applicant: Apple Inc.

    Abstract: Disclosed techniques relate to memory space management for graphics processing. In some embodiments, first and second graphics cores are configured to execute instructions for multiple threadgroups. In some embodiments, the threads groups include a first threadgroup with multiple single-instruction multiple-data (SIMD) groups configured to execute a first shader program and a second threadgroup with multiple SIMD groups configured to execute a second, different shader program. Control circuitry may be configured to provide access to data stored in memory circuitry according to a shader memory space. The shader memory space may be accessible to threadgroups executed by the first graphics shader core, including the first and second threadgroups, but is not accessible to threadgroups executed by the second graphics shader core. Disclosed techniques may reduce latency, increase bandwidth available to the shader, reduce coherency cost, or any combination thereof.

    On-demand Memory Allocation
    4.
    发明公开

    公开(公告)号:US20240045808A1

    公开(公告)日:2024-02-08

    申请号:US18490588

    申请日:2023-10-19

    Applicant: Apple Inc.

    CPC classification number: G06F12/1018 G06F12/084 G06F30/392

    Abstract: Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.

    Cache Control to Preserve Register Data

    公开(公告)号:US20250094357A1

    公开(公告)日:2025-03-20

    申请号:US18962158

    申请日:2024-11-27

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to eviction control for cache lines that store register data. In some embodiments, memory hierarchy circuitry is configured to provide memory backing for register operand data in one or more cache circuits. Lock circuitry may control a first set of lock indicators for a set of registers for a first thread, including to assert one or more lock indicators for registers that are indicated, by decode circuitry, as being utilized by decoded instructions of the first thread. The lock circuitry may preserve register operand data in the one or more cache circuits, including to prevent eviction of a given cache line from a cache circuit based on an asserted lock indicator. The lock circuitry may clear the first set of lock indicators in response to a reset event. Disclosed techniques may advantageously retain relevant register information in the cache with limited control circuit area.

    Multi-stage thread scheduling
    6.
    发明授权

    公开(公告)号:US12190151B2

    公开(公告)日:2025-01-07

    申请号:US18054376

    申请日:2022-11-10

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to multi-stage thread scheduling. In some embodiments, processor circuitry includes multiple channel pipelines for multiple channels and multiple execution pipelines shared by the channel pipelines and configured to perform different types of operations provided by the channel pipelines. First scheduler circuitry may arbitrate among threads to assign threads to channels. Second scheduler circuitry may arbitrate among channels to assign an operation from a given channel to a given execution pipeline. The execution pipelines may provide backpressure information to the first scheduler circuitry based on execution status and the first scheduler circuitry may adjust priority of a thread for assignment to a channel based on the backpressure information. Disclosed techniques may reduce channel conflicts and starvation for execution resources.

    Cache Control to Preserve Register Data
    7.
    发明公开

    公开(公告)号:US20240289282A1

    公开(公告)日:2024-08-29

    申请号:US18173500

    申请日:2023-02-23

    Applicant: Apple Inc.

    CPC classification number: G06F9/30079 G06F9/30047 G06F9/30145

    Abstract: Techniques are disclosed relating to eviction control for cache lines that store register data. In some embodiments, memory hierarchy circuitry is configured to provide memory backing for register operand data in one or more cache circuits. Lock circuitry may control a first set of lock indicators for a set of registers for a first thread, including to assert one or more lock indicators for registers that are indicated, by decode circuitry, as being utilized by decoded instructions of the first thread. The lock circuitry may preserve register operand data in the one or more cache circuits, including to prevent eviction of a given cache line from a cache circuit based on an asserted lock indicator. The lock circuitry may clear the first set of lock indicators in response to a reset event. Disclosed techniques may advantageously retain relevant register information in the cache with limited control circuit area.

    Cache footprint management
    8.
    发明授权

    公开(公告)号:US11947462B1

    公开(公告)日:2024-04-02

    申请号:US17653418

    申请日:2022-03-03

    Applicant: Apple Inc.

    CPC classification number: G06F12/0875 G06F2212/60

    Abstract: Techniques are disclosed relating to cache footprint management. In some embodiments, execution circuitry is configured to perform operations for instructions from multiple threads in parallel. Cache circuitry may store information operated on by threads executed by the execution circuitry. Scheduling circuitry may arbitrate among threads to schedule threads for execution by the execution circuitry. Tracking circuitry may determine one or more performance metrics for the cache circuitry. Control circuitry may, based on the one or more performance metrics meeting a threshold, reduce a limit on a number of threads considered for arbitration by the scheduling circuitry, to control a footprint of information stored by the cache circuitry. Disclosed techniques may advantageously reduce or avoid cache thrashing for certain processor workloads.

    Tiled processor communication fabric

    公开(公告)号:US11941742B2

    公开(公告)日:2024-03-26

    申请号:US17808392

    申请日:2022-06-23

    Applicant: Apple Inc.

    CPC classification number: G06T15/005 G06F9/3887

    Abstract: Techniques are disclosed relating to processor communications fabrics. In some embodiments, a processor includes multiple client circuitry and fabric circuitry that includes at least first and second instances of a tile. The tile may include: client inputs configured to interface with client circuits, tile inputs configured to interface with one or more other tile instances, and communication resources assignable to the client inputs and tile inputs. The communications resources may include: multiple internal links, client outputs configured to interface with client circuits, and tile outputs configured to interface with one or more other tile instances. Control circuitry may, in a given cycle, assign communication resources of a given tile instance to at least a portion of the client inputs and tile inputs for a next cycle, based on priority information. The control circuitry may update priority information based on assignment results over multiple cycles.

Patent Agency Ranking