-
公开(公告)号:US12265474B2
公开(公告)日:2025-04-01
申请号:US18490588
申请日:2023-10-19
Applicant: Apple Inc.
Inventor: Justin A. Hensley , Karl D. Mann , Yoong Chert Foo , Terence M. Potter , Frank W. Liljeros , Ralph C. Taylor
IPC: G06F12/084 , G06F12/1018 , G06F12/1036 , G06F30/392
Abstract: Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.
-
公开(公告)号:US12008377B2
公开(公告)日:2024-06-11
申请号:US18299452
申请日:2023-04-12
Applicant: Apple Inc.
Inventor: Christopher A. Burns , Liang-Kai Wang , Robert D. Kenney , Terence M. Potter
CPC classification number: G06F9/3887 , G06F9/30098 , G06T1/20 , G06T1/60
Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.
-
公开(公告)号:US11947462B1
公开(公告)日:2024-04-02
申请号:US17653418
申请日:2022-03-03
Applicant: Apple Inc.
Inventor: Yoong Chert Foo , Terence M. Potter , Donald R. DeSota , Benjiman L. Goodman , Aroun Demeure , Cheng Li , Winnie W. Yeung
IPC: G06F12/08 , G06F12/0875
CPC classification number: G06F12/0875 , G06F2212/60
Abstract: Techniques are disclosed relating to cache footprint management. In some embodiments, execution circuitry is configured to perform operations for instructions from multiple threads in parallel. Cache circuitry may store information operated on by threads executed by the execution circuitry. Scheduling circuitry may arbitrate among threads to schedule threads for execution by the execution circuitry. Tracking circuitry may determine one or more performance metrics for the cache circuitry. Control circuitry may, based on the one or more performance metrics meeting a threshold, reduce a limit on a number of threads considered for arbitration by the scheduling circuitry, to control a footprint of information stored by the cache circuitry. Disclosed techniques may advantageously reduce or avoid cache thrashing for certain processor workloads.
-
公开(公告)号:US11941742B2
公开(公告)日:2024-03-26
申请号:US17808392
申请日:2022-06-23
Applicant: Apple Inc.
Inventor: Adam J. Smith , Sergio V. Tota , Christopher G. Martin , Yoong Chert Foo , Terence M. Potter , Max J. Batley
CPC classification number: G06T15/005 , G06F9/3887
Abstract: Techniques are disclosed relating to processor communications fabrics. In some embodiments, a processor includes multiple client circuitry and fabric circuitry that includes at least first and second instances of a tile. The tile may include: client inputs configured to interface with client circuits, tile inputs configured to interface with one or more other tile instances, and communication resources assignable to the client inputs and tile inputs. The communications resources may include: multiple internal links, client outputs configured to interface with client circuits, and tile outputs configured to interface with one or more other tile instances. Control circuitry may, in a given cycle, assign communication resources of a given tile instance to at least a portion of the client inputs and tile inputs for a next cycle, based on priority information. The control circuitry may update priority information based on assignment results over multiple cycles.
-
公开(公告)号:US20240095176A1
公开(公告)日:2024-03-21
申请号:US18054388
申请日:2022-11-10
Applicant: Apple Inc.
Inventor: Benjiman L. Goodman , Yoong Chert Foo , Karl D. Mann , Terence M. Potter , Frank W. Liljeros , Jeffrey T. Brady
IPC: G06F12/0891 , G06F12/0811
CPC classification number: G06F12/0891 , G06F12/0811 , G06F2212/6042
Abstract: Techniques are disclosed relating to thread preemption in the context of memory-backed registers. In some embodiments, a memory hierarchy includes one or more cache levels and one or more memory circuits. Execution circuitry may operate on operands in architectural registers to execute instructions of threads, where data for the architectural registers is stored and backed by the memory hierarchy. Control circuitry may, in response to a context switch indication for a given thread: flush and invalidate a set of architectural register data from a first cache level and store memory page information (e.g., a page catalog base address) associated with the set of architectural register data.
-
公开(公告)号:US11126439B2
公开(公告)日:2021-09-21
申请号:US16686060
申请日:2019-11-15
Applicant: Apple Inc.
Inventor: Christopher A. Burns , Liang-Kai Wang , Robert D. Kenney , Terence M. Potter
Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.
-
公开(公告)号:US20210271606A1
公开(公告)日:2021-09-02
申请号:US16804128
申请日:2020-02-28
Applicant: Apple Inc.
Inventor: Justin A. Hensley , Karl D. Mann , Yoong Chert Foo , Terence M. Potter , Frank W. Liljeros , Ralph C. Taylor
IPC: G06F12/1018 , G06F12/084
Abstract: Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.
-
公开(公告)号:US20210109761A1
公开(公告)日:2021-04-15
申请号:US16597625
申请日:2019-10-09
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Robert D. Kenney , Terence M. Potter , Vinod Reddy Nalamalapu , Sivayya V. Ayinala
Abstract: Techniques are disclosed relating to sharing operands among SIMD threads for a larger arithmetic operation. In some embodiments, a set of multiple hardware pipelines is configured to execute single-instruction multiple-data (SIMD) instructions for multiple threads in parallel, where ones of the hardware pipelines include execution circuitry configured to perform floating-point operations using one or more pipeline stages of the pipeline and first routing circuitry configured to select, from among thread-specific operands stored for the hardware pipeline and from one or more other pipelines in the set, a first input operand for an operation by the execution circuitry. In some embodiments, a device is configured to perform a mathematical operation on source input data structures stored across thread-specific storage for the set of hardware pipelines, by executing multiple SIMD floating-point operations using the execution circuitry and the first routing circuitry. This may improve performance and reduce power consumption for matrix multiply and reduction operations, for example.
-
公开(公告)号:US10699368B1
公开(公告)日:2020-06-30
申请号:US15690954
申请日:2017-08-30
Applicant: Apple Inc.
Inventor: Christopher L. Spencer , Terence M. Potter , Dimitri Tan
Abstract: Techniques are disclosed relating to memory allocation in a graphics shader. In some embodiments, a memory for storing input data for operations by the shader is shared for multiple different types of tasks (e.g., pixel shading tasks and compute tasks). In some embodiments, a graphics device is configured to separately process different portions (e.g., tiles) of a frame of graphics data. In some embodiments, the graphics device is configured to dynamically adjust the number of frame portions processed in parallel based on allocation information, where the allocation information is determined based on requests for other types of tasks. This may prevent pixel shading tasks from stalling other tasks for extended periods and may allow dynamic adjustments memory allocation mid-render.
-
公开(公告)号:US10678548B2
公开(公告)日:2020-06-09
申请号:US16112614
申请日:2018-08-24
Applicant: Apple Inc.
Inventor: Robert D. Kenney , Terence M. Potter , Andrew M. Havlir , Sivayya V. Ayinala
Abstract: Techniques are disclosed relating to controlling an operand cache in a pipelined fashion. An operand cache may cache operands fetched from the register file or generated by previous instructions to improve performance and/or reduce power consumption. In some embodiments, instructions are pipelined and separate tag information is maintained to indicate allocation of an operand cache entry and ownership of the operand cache entry. In some embodiments, this may allow an operand to remain in the operand cache (and potentially be retrieved or modified) during an interval between allocation of the entry for another operand and ownership of the entry by the other operand. This may improve operand cache efficiency by allowing the entry to be used while to retrieving the other operand from the register file, for example.
-
-
-
-
-
-
-
-
-