Computation engine with extract instructions to minimize memory access

    公开(公告)号:US10831488B1

    公开(公告)日:2020-11-10

    申请号:US16105783

    申请日:2018-08-20

    Applicant: Apple Inc.

    Abstract: In an embodiment, a computation engine may offload work from a processor (e.g. a CPU) and efficiently perform computations such as those used in LSTM and other workloads at high performance. In an embodiment, the computation engine may perform computations on input vectors from input memories in the computation engine, and may accumulate results in an output memory within the computation engine. The input memories may be loaded with initial vector data from memory, incurring the memory latency that may be associated with reading the operands. Compute instructions may be performed on the operands, generating results in an output memory. One or more extract instructions may be supported to move data from the output memory to the input memory, permitting additional computation on the data in the output memory without moving the results to main memory.

    Coprocessors with Bypass Optimization, Variable Grid Architecture, and Fused Vector Operations

    公开(公告)号:US20200272597A1

    公开(公告)日:2020-08-27

    申请号:US16286170

    申请日:2019-02-26

    Applicant: Apple Inc.

    Abstract: In an embodiment, a coprocessor may include a bypass indication which identifies execution circuitry that is not used by a given processor instruction, and thus may be bypassed. The corresponding circuitry may be disabled during execution, preventing evaluation when the output of the circuitry will not be used for the instruction. In another embodiment, the coprocessor may implement a grid of processing elements in rows and columns, where a given coprocessor instruction may specify an operation that causes up to all of the processing elements to operate on vectors of input operands to produce results. Implementations of the coprocessor may implement a portion of the processing elements. The coprocessor control circuitry may be designed to operate with the full grid or partial grid, reissuing instructions in the partial grid case to perform the requested operation. In still another embodiment, the coprocessor may be able to fuse vector mode operations.

    Combining write buffer with dynamically adjustable flush metrics
    13.
    发明授权
    Combining write buffer with dynamically adjustable flush metrics 有权
    将写入缓冲区与动态可调整的flush指标相结合

    公开(公告)号:US08566528B2

    公开(公告)日:2013-10-22

    申请号:US13709649

    申请日:2012-12-10

    Applicant: Apple Inc.

    CPC classification number: G06F12/0891 G06F12/0804

    Abstract: In an embodiment, a combining write buffer is configured to maintain one or more flush metrics to determine when to transmit write operations from buffer entries. The combining write buffer may be configured to dynamically modify the flush metrics in response to activity in the write buffer, modifying the conditions under which write operations are transmitted from the write buffer to the next lower level of memory. For example, in one implementation, the flush metrics may include categorizing write buffer entries as “collapsed.” A collapsed write buffer entry, and the collapsed write operations therein, may include at least one write operation that has overwritten data that was written by a previous write operation in the buffer entry. In another implementation, the combining write buffer may maintain the threshold of buffer fullness as a flush metric and may adjust it over time based on the actual buffer fullness.

    Abstract translation: 在一个实施例中,组合写缓冲器被配置为维护一个或多个刷新度量以确定何时从缓冲器条目发送写入操作。 组合写缓冲器可以被配置为响应于写缓冲器中的活动来动态地修改刷新度量,修改写操作从写缓冲器发送到下一较低级存储器的条件。 例如,在一个实现中,刷新度量可以包括将写缓冲器条目分类为“折叠”。 折叠的写缓冲器条目及其中的折叠写入操作可以包括至少一个写入操作,该写入操作已经覆盖由缓冲器条目中的先前写入操作写入的数据。 在另一实现中,组合写缓冲器可以将缓冲器充满度的阈值保持为刷新度量,并且可以基于实际的缓冲器充满度随时间调整缓冲器充满度。

    Debug Trace of Cache Memory Requests
    14.
    发明公开

    公开(公告)号:US20230418724A1

    公开(公告)日:2023-12-28

    申请号:US18344170

    申请日:2023-06-29

    Applicant: Apple Inc.

    CPC classification number: G06F11/348 G06F11/3037 G06F12/0223 G06F2212/1008

    Abstract: An apparatus includes a plurality of processor circuits, a cache memory circuit, and a trace control circuit. The trace control circuit may be configured, in response to activation of a mode to record information indicative of program execution of at least one processor circuit of the plurality of processor circuits, to monitor memory requests transmitted between ones of the plurality of processor circuits and the cache memory circuit, and then to select a particular memory request of monitored memory requests using an arbitration algorithm. The trace control circuit may be further configured to allocate space in a trace buffer to the particular memory request, and to store, in the trace buffer, information associated with the particular memory request.

    Debug trace of cache memory requests

    公开(公告)号:US11740993B2

    公开(公告)日:2023-08-29

    申请号:US17538939

    申请日:2021-11-30

    Applicant: Apple Inc.

    CPC classification number: G06F11/348 G06F11/3037 G06F12/0223 G06F2212/1008

    Abstract: An apparatus includes a plurality of processor circuits, a cache memory circuit, and a trace control circuit. The trace control circuit may be configured, in response to activation of a mode to record information indicative of program execution of at least one processor circuit of the plurality of processor circuits, to monitor memory requests transmitted between ones of the plurality of processor circuits and the cache memory circuit, and then to select a particular memory request of monitored memory requests using an arbitration algorithm. The trace control circuit may be further configured to allocate space in a trace buffer to the particular memory request, and to store, in the trace buffer, information associated with the particular memory request.

    Coprocessors with Bypass Optimization, Variable Grid Architecture, and Fused Vector Operations

    公开(公告)号:US20220358082A1

    公开(公告)日:2022-11-10

    申请号:US17869620

    申请日:2022-07-20

    Applicant: Apple Inc.

    Abstract: In an embodiment, a coprocessor may include a bypass indication which identifies execution circuitry that is not used by a given processor instruction, and thus may be bypassed. The corresponding circuitry may be disabled during execution, preventing evaluation when the output of the circuitry will not be used for the instruction. In another embodiment, the coprocessor may implement a grid of processing elements in rows and columns, where a given coprocessor instruction may specify an operation that causes up to all of the processing elements to operate on vectors of input operands to produce results. Implementations of the coprocessor may implement a portion of the processing elements. The coprocessor control circuitry may be designed to operate with the full grid or partial grid, reissuing instructions in the partial grid case to perform the requested operation. In still another embodiment, the coprocessor may be able to fuse vector mode operations.

    Coprocessor Synchronizing Instruction Suppression

    公开(公告)号:US20220214887A1

    公开(公告)日:2022-07-07

    申请号:US17668869

    申请日:2022-02-10

    Applicant: Apple Inc.

    Abstract: An instruction set architecture including instructions for a processor and instructions for a coprocessor may include synchronizing instructions that may be used to begin and end instruction sequences that include coprocessor instructions (coprocessor sequences). If a terminating synchronizing instruction is followed by an initial synchronizing instruction and the pair are detected in the coprocessor concurrently, the coprocessor may suppress execution of the pair of instructions.

    Coprocessor synchronizing instruction suppression

    公开(公告)号:US11249766B1

    公开(公告)日:2022-02-15

    申请号:US17077654

    申请日:2020-10-22

    Applicant: Apple Inc.

    Abstract: An instruction set architecture including instructions for a processor and instructions for a coprocessor may include synchronizing instructions that may be used to begin and end instruction sequences that include coprocessor instructions (coprocessor sequences). If a terminating synchronizing instruction is followed by an initial synchronizing instruction and the pair are detected in the coprocessor concurrently, the coprocessor may suppress execution of the pair of instructions.

    Combining Write Buffer with Dynamically Adjustable Flush Metrics
    19.
    发明申请
    Combining Write Buffer with Dynamically Adjustable Flush Metrics 有权
    将写入缓冲区与动态调整冲洗指标相结合

    公开(公告)号:US20130103906A1

    公开(公告)日:2013-04-25

    申请号:US13709649

    申请日:2012-12-10

    Applicant: Apple Inc.

    CPC classification number: G06F12/0891 G06F12/0804

    Abstract: In an embodiment, a combining write buffer is configured to maintain one or more flush metrics to determine when to transmit write operations from buffer entries. The combining write buffer may be configured to dynamically modify the flush metrics in response to activity in the write buffer, modifying the conditions under which write operations are transmitted from the write buffer to the next lower level of memory. For example, in one implementation, the flush metrics may include categorizing write buffer entries as “collapsed.” A collapsed write buffer entry, and the collapsed write operations therein, may include at least one write operation that has overwritten data that was written by a previous write operation in the buffer entry. In another implementation, the combining write buffer may maintain the threshold of buffer fullness as a flush metric and may adjust it over time based on the actual buffer fullness.

    Abstract translation: 在一个实施例中,组合写缓冲器被配置为维护一个或多个刷新度量以确定何时从缓冲器条目发送写入操作。 组合写缓冲器可以被配置为响应于写缓冲器中的活动来动态地修改刷新度量,修改写操作从写缓冲器发送到下一较低级存储器的条件。 例如,在一个实现中,刷新度量可以包括将写缓冲器条目分类为“折叠”。 折叠的写缓冲器条目及其中的折叠写入操作可以包括至少一个写入操作,该写入操作已经覆盖由缓冲器条目中的先前写入操作写入的数据。 在另一实现中,组合写缓冲器可以将缓冲器充满度的阈值保持为刷新度量,并且可以基于实际的缓冲器充满度随时间调整缓冲器充满度。

    Coprocessor prefetcher
    20.
    发明授权

    公开(公告)号:US12050918B2

    公开(公告)日:2024-07-30

    申请号:US18361244

    申请日:2023-07-28

    Applicant: Apple Inc.

    CPC classification number: G06F9/3881 G06F9/382 G06F9/383 G06F9/3877

    Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.

Patent Agency Ranking