Coprocessor operation bundling
    22.
    发明授权

    公开(公告)号:US12242855B2

    公开(公告)日:2025-03-04

    申请号:US18361212

    申请日:2023-07-28

    Applicant: Apple Inc.

    Abstract: In an embodiment, a processor includes a buffer in an interface unit. The buffer may be used to accumulate coprocessor instructions to be transmitted to a coprocessor. In an embodiment, the processor issues the coprocessor instructions to the buffer when ready to be issued to the coprocessor. The interface unit may accumulate the coprocessor instructions in the buffer, generating a bundle of instructions. The bundle may be closed based on various predetermined conditions and then the bundle may be transmitted to the coprocessor. If a sequence of coprocessor instructions appears consecutively in a program, the rate at which the instructions are provided to the coprocessor (on average) at least matches the rate at which the coprocessor consumes the instructions, in an embodiment.

    Coprocessors with bypass optimization, variable grid architecture, and fused vector operations

    公开(公告)号:US12135681B2

    公开(公告)日:2024-11-05

    申请号:US17869617

    申请日:2022-07-20

    Applicant: Apple Inc.

    Abstract: In an embodiment, a coprocessor may include a bypass indication which identifies execution circuitry that is not used by a given processor instruction, and thus may be bypassed. The corresponding circuitry may be disabled during execution, preventing evaluation when the output of the circuitry will not be used for the instruction. In another embodiment, the coprocessor may implement a grid of processing elements in rows and columns, where a given coprocessor instruction may specify an operation that causes up to all of the processing elements to operate on vectors of input operands to produce results. Implementations of the coprocessor may implement a portion of the processing elements. The coprocessor control circuitry may be designed to operate with the full grid or partial grid, reissuing instructions in the partial grid case to perform the requested operation. In still another embodiment, the coprocessor may be able to fuse vector mode operations.

    Coprocessor Operation Bundling
    24.
    发明公开

    公开(公告)号:US20240036870A1

    公开(公告)日:2024-02-01

    申请号:US18361212

    申请日:2023-07-28

    Applicant: Apple Inc.

    Abstract: In an embodiment, a processor includes a buffer in an interface unit. The buffer may be used to accumulate coprocessor instructions to be transmitted to a coprocessor. In an embodiment, the processor issues the coprocessor instructions to the buffer when ready to be issued to the coprocessor. The interface unit may accumulate the coprocessor instructions in the buffer, generating a bundle of instructions. The bundle may be closed based on various predetermined conditions and then the bundle may be transmitted to the coprocessor. If a sequence of coprocessor instructions appears consecutively in a program, the rate at which the instructions are provided to the coprocessor (on average) at least matches the rate at which the coprocessor consumes the instructions, in an embodiment.

    Coprocessor register renaming using registers associated with an inactive context to store results from an active context

    公开(公告)号:US11775301B2

    公开(公告)日:2023-10-03

    申请号:US17644016

    申请日:2021-12-13

    Applicant: Apple Inc.

    Abstract: A coprocessor with register renaming is disclosed. An apparatus includes a plurality of processors and a coprocessor respectively configured to execute processor instructions and coprocessor instructions. The coprocessor receives coprocessor instructions from ones of the processors. The coprocessor includes an array of processing elements and a result register set comprising storage elements respectively distributed within the array of processing elements. For a given member of the array of processing elements, a corresponding storage element is configured to store coprocessor instruction results generated by the given member. The result register set implements a plurality of contexts to store respective coprocessor states corresponding to coprocessor instructions received from different processors. Based on a determination that one of the contexts is inactive, the coprocessor is configured to store coprocessor instruction results corresponding to an active context within storage elements of the result register set corresponding to the inactive context.

    Coprocessors with Bypass Optimization, Variable Grid Architecture, and Fused Vector Operations

    公开(公告)号:US20220350776A1

    公开(公告)日:2022-11-03

    申请号:US17869617

    申请日:2022-07-20

    Applicant: Apple Inc.

    Abstract: In an embodiment, a coprocessor may include a bypass indication which identifies execution circuitry that is not used by a given processor instruction, and thus may be bypassed. The corresponding circuitry may be disabled during execution, preventing evaluation when the output of the circuitry will not be used for the instruction. In another embodiment, the coprocessor may implement a grid of processing elements in rows and columns, where a given coprocessor instruction may specify an operation that causes up to all of the processing elements to operate on vectors of input operands to produce results. Implementations of the coprocessor may implement a portion of the processing elements. The coprocessor control circuitry may be designed to operate with the full grid or partial grid, reissuing instructions in the partial grid case to perform the requested operation. In still another embodiment, the coprocessor may be able to fuse vector mode operations.

    Processor with multiple load queues including a queue to manage ordering and a queue to manage replay

    公开(公告)号:US10970077B2

    公开(公告)日:2021-04-06

    申请号:US16437739

    申请日:2019-06-11

    Applicant: Apple Inc.

    Abstract: In an embodiment, a processor includes a load/store unit that executes load/store operations. The load/store unit may implement a two-level load queue. One of the load queues, referred to as a load retirement queue (LRQ), may track load operations from initial execution to retirement. Ordering constraints may be enforced using the LRQ. The other load queue, referred to as a load execution queue (LEQ), may track loads from initial execution to forwarding of data. Replay may be managed by the LEQ. In an embodiment, the LEQ may be smaller than the LRQ, which may permit the management of replay while still meeting timing requirements. Additionally, the larger LRQ may permit more load operations to be pending (not retired) in the processor, widening the window for out of order execution and supporting potentially higher processor performance.

    Coprocessor Memory Ordering Table
    29.
    发明申请

    公开(公告)号:US20200371812A1

    公开(公告)日:2020-11-26

    申请号:US16991858

    申请日:2020-08-12

    Applicant: Apple Inc.

    Abstract: In an embodiment, at least one CPU processor and at least one coprocessor are included in a system. The CPU processor may issue operations to the coprocessor to perform, including load/store operations. The CPU processor may generate the addresses that are accessed by the coprocessor load/store operations, as well as executing its own CPU load/store operations. The CPU processor may include a memory ordering table configured to track at least one memory region within which there are outstanding coprocessor load/store memory operations that have not yet completed. The CPU processor may delay CPU load/store operations until the outstanding coprocessor load/store operations are complete. In this fashion, the proper ordering of CPU load/store operations and coprocessor load/store operations may be maintained.

    Coprocessor with Distributed Register
    30.
    发明申请

    公开(公告)号:US20200272467A1

    公开(公告)日:2020-08-27

    申请号:US16286213

    申请日:2019-02-26

    Applicant: Apple Inc.

    Abstract: In an embodiment, a coprocessor includes multiple processing elements arranged in a grid of one or more rows and one or more columns. A given processing element includes an arithmetic/logic unit (ALU) circuit configured to perform an ALU operation specified by an instruction executable by the coprocessor, wherein the ALU circuit is configured to produce a result. The given processing element further comprises a first memory coupled to the execute circuit. The first memory is configured to store results generated by the given processing element. The first memory includes a portion of a result memory implemented by the coprocessor, wherein locations in the result memory are specifiable as destination operands of instructions executable by the coprocessor. The portion of the result memory implemented by the first memory is the portion of the result memory that the given processing element is capable of updating.

Patent Agency Ranking