-
公开(公告)号:US20240095037A1
公开(公告)日:2024-03-21
申请号:US18361244
申请日:2023-07-28
Applicant: Apple Inc.
Inventor: Brandon H. Dwiel , Andrew J. Beaumont-Smith , Eric J. Furbish , John D. Pape , Stephen G. Meier , Tyler J. Huberty
IPC: G06F9/38
CPC classification number: G06F9/3881 , G06F9/382 , G06F9/383 , G06F9/3877
Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.
-
公开(公告)号:US20240045680A1
公开(公告)日:2024-02-08
申请号:US18453010
申请日:2023-08-21
Applicant: Apple Inc.
Inventor: Ran Aharon Chachick , Aditya Kesiraju , Andrew J. Beaumont-Smith , Jong-Suk Lee
CPC classification number: G06F9/30123 , G06F15/80 , G06F9/3877 , G06F9/384 , G06F9/3009
Abstract: A coprocessor with register renaming is disclosed. An apparatus includes a plurality of processors and a coprocessor respectively configured to execute processor instructions and coprocessor instructions. The coprocessor receives coprocessor instructions from ones of the processors. The coprocessor includes an array of processing elements and a result register set comprising storage elements respectively distributed within the array of processing elements. For a given member of the array of processing elements, a corresponding storage element is configured to store coprocessor instruction results generated by the given member. The result register set implements a plurality of contexts to store respective coprocessor states corresponding to coprocessor instructions received from different processors. Based on a determination that one of the contexts is inactive, the coprocessor is configured to store coprocessor instruction results corresponding to an active context within storage elements of the result register set corresponding to the inactive context.
-
公开(公告)号:US11768690B2
公开(公告)日:2023-09-26
申请号:US17532072
申请日:2021-11-22
Applicant: Apple Inc.
Inventor: Aditya Kesiraju , Andrew J. Beaumont-Smith , Brian P. Lilly , James Vash , Jason M. Kassoff , Krishna C. Potnuru , Rajdeep L. Bhuyar , Ran A. Chachick , Tyler J. Huberty , Derek R. Kumar
CPC classification number: G06F9/3877 , G06F9/3009 , G06F9/3836 , G06F9/3863 , G06F9/3881 , G06F9/4887 , G06F11/3024 , G06F9/3879
Abstract: A system may include a plurality of processors and a coprocessor. A plurality of coprocessor context priority registers corresponding to a plurality of contexts supported by the coprocessor may be included. The plurality of processors may use the plurality of contexts, and may program the coprocessor context priority register corresponding to a context with a value specifying a priority of the context relative to other contexts. An arbiter may arbitrate among instructions issued by the plurality of processors based on the priorities in the plurality of coprocessor context priority registers. In one embodiment, real-time threads may be assigned higher priorities than bulk processing tasks, improving bandwidth allocated to the real-time threads as compared to the bulk tasks.
-
公开(公告)号:US11755333B2
公开(公告)日:2023-09-12
申请号:US17643765
申请日:2021-12-10
Applicant: Apple Inc.
Inventor: Brandon H. Dwiel , Andrew J. Beaumont-Smith , Eric J. Furbish , John D. Pape , Stephen G. Meier , Tyler J. Huberty
IPC: G06F9/38
CPC classification number: G06F9/3881 , G06F9/382 , G06F9/383 , G06F9/3877
Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.
-
公开(公告)号:US20230095072A1
公开(公告)日:2023-03-30
申请号:US17644016
申请日:2021-12-13
Applicant: Apple Inc.
Inventor: Ran Aharon Chachick , Aditya Kesiraju , Andrew J. Beaumont-Smith , Jong-Suk Lee
Abstract: A coprocessor with register renaming is disclosed. An apparatus includes a plurality of processors and a coprocessor respectively configured to execute processor instructions and coprocessor instructions. The coprocessor receives coprocessor instructions from ones of the processors. The coprocessor includes an array of processing elements and a result register set comprising storage elements respectively distributed within the array of processing elements. For a given member of the array of processing elements, a corresponding storage element is configured to store coprocessor instruction results generated by the given member. The result register set implements a plurality of contexts to store respective coprocessor states corresponding to coprocessor instructions received from different processors. Based on a determination that one of the contexts is inactive, the coprocessor is configured to store coprocessor instruction results corresponding to an active context within storage elements of the result register set corresponding to the inactive context.
-
公开(公告)号:US20230061419A1
公开(公告)日:2023-03-02
申请号:US17538939
申请日:2021-11-30
Applicant: Apple Inc.
Inventor: Andrew J. Beaumont-Smith , Sandeep Gupta , Krishna C. Potnuru , Matthias Knoth
Abstract: An apparatus includes a plurality of processor circuits, a cache memory circuit, and a trace control circuit. The trace control circuit may be configured, in response to activation of a mode to record information indicative of program execution of at least one processor circuit of the plurality of processor circuits, to monitor memory requests transmitted between ones of the plurality of processor circuits and the cache memory circuit, and then to select a particular memory request of monitored memory requests using an arbitration algorithm. The trace control circuit may be further configured to allocate space in a trace buffer to the particular memory request, and to store, in the trace buffer, information associated with the particular memory request.
-
27.
公开(公告)号:US11429555B2
公开(公告)日:2022-08-30
申请号:US16286170
申请日:2019-02-26
Applicant: Apple Inc.
Inventor: Aditya Kesiraju , Andrew J. Beaumont-Smith , Boris S. Alvarez-Heredia , Srikanth Balasubramanian
Abstract: In an embodiment, a coprocessor may include a bypass indication which identifies execution circuitry that is not used by a given processor instruction, and thus may be bypassed. The corresponding circuitry may be disabled during execution, preventing evaluation when the output of the circuitry will not be used for the instruction. In another embodiment, the coprocessor may implement a grid of processing elements in rows and columns, where a given coprocessor instruction may specify an operation that causes up to all of the processing elements to operate on vectors of input operands to produce results. Implementations of the coprocessor may implement a portion of the processing elements. The coprocessor control circuitry may be designed to operate with the full grid or partial grid, reissuing instructions in the partial grid case to perform the requested operation. In still another embodiment, the coprocessor may be able to fuse vector mode operations.
-
公开(公告)号:US11210104B1
公开(公告)日:2021-12-28
申请号:US17018963
申请日:2020-09-11
Applicant: Apple Inc.
Inventor: Aditya Kesiraju , Andrew J. Beaumont-Smith , Brian P. Lilly , James Vash , Jason M. Kassoff , Krishna C. Potnuru , Rajdeep L. Bhuyar , Ran A. Chachick , Tyler J. Huberty , Derek R. Kumar
Abstract: A system may include a plurality of processors and a coprocessor. A plurality of coprocessor context priority registers corresponding to a plurality of contexts supported by the coprocessor may be included. The plurality of processors may use the plurality of contexts, and may program the coprocessor context priority register corresponding to a context with a value specifying a priority of the context relative to other contexts. An arbiter may arbitrate among instructions issued by the plurality of processors based on the priorities in the plurality of coprocessor context priority registers. In one embodiment, real-time threads may be assigned higher priorities than bulk processing tasks, improving bandwidth allocated to the real-time threads as compared to the bulk tasks.
-
公开(公告)号:US20180074824A1
公开(公告)日:2018-03-15
申请号:US15264002
申请日:2016-09-13
Applicant: Apple Inc.
Inventor: Ali Sazegari , Eric Bainville , Jeffry E. Gonion , Gerard R. Williams, III , Andrew J. Beaumont-Smith
CPC classification number: G06F9/30101 , G06F9/3001 , G06F9/30036 , G06F9/30043 , G06F9/3802 , G06F9/3867 , G06F9/3877 , G06F9/3893
Abstract: In an embodiment, an outer product engine is configured to perform outer product operations. The outer product engine may perform numerous multiplication operations in parallel on input vectors, in an embodiment, generating a resulting outer product matrix. In an embodiment, the outer product engine may be configured to accumulate results in a result matrix, performing fused multiply add (FMA) operations to produce the outer product elements (multiply) and to accumulate the outer product elements with previous elements from the result matrix memory (add). A processor may fetch outer product instructions, and may transmit the instructions to the outer product engine when the instructions become non-speculative in an embodiment. The processor may be configured to retire the outer product instructions responsive to transmitting them to the outer product engine.
-
-
-
-
-
-
-
-