Thread Channel Deactivation based on Instruction Cache Misses

    公开(公告)号:US20240095031A1

    公开(公告)日:2024-03-21

    申请号:US18054380

    申请日:2022-11-10

    Applicant: Apple Inc.

    CPC classification number: G06F9/30079 G06F9/30047 G06F9/3009 G06F9/30145

    Abstract: Techniques are disclosed relating to instruction scheduling in the context of instruction cache misses. In some embodiments, first-stage scheduler circuitry is configured to assign threads to channels and second-stage scheduler circuitry is configured to assign an operation from a given channel to a given execution pipeline based on decode of an operation for that channel. In some embodiments, thread replacement circuitry is configured to, in response to an instruction cache miss for an operation of a first thread assigned to a first channel, deactivate the first thread from the first channel.

    Power Saving with Dynamic Pulse Insertion
    2.
    发明申请

    公开(公告)号:US20170244391A1

    公开(公告)日:2017-08-24

    申请号:US15046926

    申请日:2016-02-18

    Applicant: Apple Inc.

    Abstract: A method and apparatus for saving power in integrated circuits is disclosed. An IC includes functional circuit blocks which are not placed into a sleep mode when idle. A power management circuit may monitor the activity levels of the functional circuit blocks not placed into a sleep mode. When the power management circuit detects that an activity level of one of the non-sleep functional circuit blocks is less than a predefined threshold, it reduce the frequency of a clock signal provided thereto by scheduling only one pulse of a clock signal for every N pulses of the full frequency clock signal. The remaining N−1 pulses of the clock signal may be inhibited. If a high priority transaction inbound for the functional circuit block is detected, an inserted pulse of the clock signal may be provided to the functional unit irrespective of when a most recent regular pulse was provided.

    Fence enforcement techniques based on stall characteristics

    公开(公告)号:US11954492B1

    公开(公告)日:2024-04-09

    申请号:US18054401

    申请日:2022-11-10

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to channel stalls or deactivations based on the latency of prior operations. In some embodiments, a processor includes a plurality of channel pipelines for a plurality of channels and a plurality of execution pipelines shared by the channel pipelines and configured to perform different types of operations provided by the channel pipelines. First scheduler circuitry may assign threads to channels and second scheduler circuitry may assign an operation from a given channel to a given execution pipeline based on decode of an operation for that channel. Dependency circuitry may, for a first operation that depends on a prior operation that uses one of the execution pipelines, determine, based on status information for the prior operation from the one of the execution pipelines, whether to stall the first operation or to deactivate a thread that includes the first operation from its assigned channel.

    Private memory management using utility thread

    公开(公告)号:US11714759B2

    公开(公告)日:2023-08-01

    申请号:US16995450

    申请日:2020-08-17

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to private memory management using a mapping thread, which may be persistent. In some embodiments, a graphics processor is configured to generate a pool of private memory pages for a set of graphics work that includes multiple threads. The processor may maintain a translation table configured to map private memory addresses to virtual addresses based on identifiers of the threads. The processor may execute a mapping thread to receive a request to allocate a private memory page for a requesting thread, select a private memory page from the pool in response to the request, and map the selected page in the translation table for the requesting. The processor may then execute one or more instructions of the requesting thread to access a private memory space, wherein the execution includes translation of a private memory address to a virtual address based on the mapped page in the translation table. The mapping thread may be a persistent thread for which resources are allocated for an entirety of a time interval over which the set of graphics work is executed.

    Instruction-level context switch in SIMD processor

    公开(公告)号:US11360780B2

    公开(公告)日:2022-06-14

    申请号:US16749618

    申请日:2020-01-22

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to context switching in a SIMD processor. In some embodiments, an apparatus includes pipeline circuitry configured to execute graphics instructions included in threads of a group of single-instruction multiple-data (SIMD) threads in a thread group. In some embodiments, context switch circuitry is configured to atomically: save, for the SIMD group, a program counter and information that indicates whether threads in the SIMD group are active using one or more context switch registers, set all threads to an active state for the SIMD group, and branch to handler code for the SIMD group. In some embodiments, the pipeline circuitry is configured to execute the handler code to save context information for the SIMD group and subsequently execute threads of another thread group. Disclosed techniques may allow instruction-level context switching even when some SIMD threads are non-active.

    Instruction-level Context Switch in SIMD Processor

    公开(公告)号:US20210224072A1

    公开(公告)日:2021-07-22

    申请号:US16749618

    申请日:2020-01-22

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to context switching in a SIMD processor. In some embodiments, an apparatus includes pipeline circuitry configured to execute graphics instructions included in threads of a group of single-instruction multiple-data (SIMD) threads in a thread group. In some embodiments, context switch circuitry is configured to atomically: save, for the SIMD group, a program counter and information that indicates whether threads in the SIMD group are active using one or more context switch registers, set all threads to an active state for the SIMD group, and branch to handler code for the SIMD group. In some embodiments, the pipeline circuitry is configured to execute the handler code to save context information for the SIMD group and subsequently execute threads of another thread group. Disclosed techniques may allow instruction-level context switching even when some SIMD threads are non-active.

    Pipelining and Concurrency Techniques for Groups of Graphics Processing Work

    公开(公告)号:US20190244323A1

    公开(公告)日:2019-08-08

    申请号:US15887547

    申请日:2018-02-02

    Applicant: Apple Inc.

    CPC classification number: G06T1/20 G06F9/30101 G06F9/3867

    Abstract: Techniques are disclosed relating to processing groups of graphics work (which may be referred to as “kicks”) using a graphics processing pipeline. In some embodiments, a graphics processor includes multiple sets of configuration registers such that multiple kicks can be processed in the pipeline at the same time. In some embodiments, kicks are pipelined such that a subsequent kick ramps up use of hardware resources as a previous kick winds down. In some embodiments, the graphics processing may execute kicks concurrently and/or preemptively, e.g., based on a priority scheme. In some embodiments, the disclosed techniques may be used with pipelines that include front and back-end fixed function circuitry as well as shared programmable resources such as shader cores. In various embodiments, the disclosed techniques may improve overall performance and/or reduce latency for high-priority graphics tasks.

    Thread channel deactivation based on instruction cache misses

    公开(公告)号:US12164927B2

    公开(公告)日:2024-12-10

    申请号:US18054380

    申请日:2022-11-10

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to instruction scheduling in the context of instruction cache misses. In some embodiments, first-stage scheduler circuitry is configured to assign threads to channels and second-stage scheduler circuitry is configured to assign an operation from a given channel to a given execution pipeline based on decode of an operation for that channel. In some embodiments, thread replacement circuitry is configured to, in response to an instruction cache miss for an operation of a first thread assigned to a first channel, deactivate the first thread from the first channel.

    Cache Control to Preserve Register Data
    9.
    发明公开

    公开(公告)号:US20240289282A1

    公开(公告)日:2024-08-29

    申请号:US18173500

    申请日:2023-02-23

    Applicant: Apple Inc.

    CPC classification number: G06F9/30079 G06F9/30047 G06F9/30145

    Abstract: Techniques are disclosed relating to eviction control for cache lines that store register data. In some embodiments, memory hierarchy circuitry is configured to provide memory backing for register operand data in one or more cache circuits. Lock circuitry may control a first set of lock indicators for a set of registers for a first thread, including to assert one or more lock indicators for registers that are indicated, by decode circuitry, as being utilized by decoded instructions of the first thread. The lock circuitry may preserve register operand data in the one or more cache circuits, including to prevent eviction of a given cache line from a cache circuit based on an asserted lock indicator. The lock circuitry may clear the first set of lock indicators in response to a reset event. Disclosed techniques may advantageously retain relevant register information in the cache with limited control circuit area.

    Cache footprint management
    10.
    发明授权

    公开(公告)号:US11947462B1

    公开(公告)日:2024-04-02

    申请号:US17653418

    申请日:2022-03-03

    Applicant: Apple Inc.

    CPC classification number: G06F12/0875 G06F2212/60

    Abstract: Techniques are disclosed relating to cache footprint management. In some embodiments, execution circuitry is configured to perform operations for instructions from multiple threads in parallel. Cache circuitry may store information operated on by threads executed by the execution circuitry. Scheduling circuitry may arbitrate among threads to schedule threads for execution by the execution circuitry. Tracking circuitry may determine one or more performance metrics for the cache circuitry. Control circuitry may, based on the one or more performance metrics meeting a threshold, reduce a limit on a number of threads considered for arbitration by the scheduling circuitry, to control a footprint of information stored by the cache circuitry. Disclosed techniques may advantageously reduce or avoid cache thrashing for certain processor workloads.

Patent Agency Ranking