Poison Mechanisms for Deferred Invalidates

    公开(公告)号:US20230017473A1

    公开(公告)日:2023-01-19

    申请号:US17373814

    申请日:2021-07-13

    Applicant: APPLE INC.

    Abstract: An apparatus includes multiple processors including respective cache memories, the cache memories configured to cache cache-entries for use by the processors. At least a processor among the processors includes cache management logic that is configured to (i) receive, from one or more of the other processors, cache-invalidation commands that request invalidation of specified cache-entries in the cache memory of the processor (ii) mark the specified cache-entries as intended for invalidation but defer actual invalidation of the specified cache-entries, and (iii) upon detecting a synchronization event associated with the cache-invalidation commands, invalidate the cache-entries that were marked as intended for invalidation.

    Load instruction fusion
    2.
    发明授权

    公开(公告)号:US12008369B1

    公开(公告)日:2024-06-11

    申请号:US17652501

    申请日:2022-02-25

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed that relate to executing fused instructions. A processor may include a decoder circuit and a load/store circuit. The decoder circuit may detect a load/store instruction to load a value from a memory and detect a non-load/store instruction that depends on the value to be loaded. The decoder circuit may fuse the load/store instruction and the non-load/store instruction such that one or more operations that the non-load/store instruction is defined to perform are to be executed within the load/store circuit. The load/store circuit may receive an indication of the fused load/store and non-load/store instructions and then execute one or more operations of the load/store instruction and the one or more operations of the non-load/store instruction using a circuit included in the load/store circuit.

    Translation Lookaside Buffer Striping for Efficient Invalidation Operations

    公开(公告)号:US20220066947A1

    公开(公告)日:2022-03-03

    申请号:US17008457

    申请日:2020-08-31

    Applicant: Apple Inc.

    Abstract: Systems, apparatuses, and methods for implementing translation lookaside buffer (TLB) striping to enable efficient invalidation operations are described. TLB sizes are growing in width (more features in a given page table entry) and depth (to cover larger memory footprints). A striping scheme is proposed to enable an efficient and high performance method for performing TLB maintenance operations in the face of this growth. Accordingly, a TLB stores first attribute data in a striped manner across a plurality of arrays. The striped manner allows different entries to be searched simultaneously in response to receiving an invalidation request which identifies a particular attribute of a group to be invalidated. Upon receiving an invalidation request, the TLB generates a plurality of indices with an offset between each index and walks through the plurality of arrays by incrementing each index and simultaneously checking the first attribute data in corresponding entries.

    Coprocessor Prefetcher
    4.
    发明申请

    公开(公告)号:US20250094174A1

    公开(公告)日:2025-03-20

    申请号:US18783937

    申请日:2024-07-25

    Applicant: Apple Inc.

    Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.

    Reducing translation lookaside buffer searches for splintered pages

    公开(公告)号:US12079140B2

    公开(公告)日:2024-09-03

    申请号:US18189982

    申请日:2023-03-24

    Applicant: Apple Inc.

    Abstract: Systems, apparatuses, and methods for performing efficient translation lookaside buffer (TLB) invalidation operations for splintered pages are described. When a TLB receives an invalidation request for a specified translation context, and the invalidation request maps to an entry with a relatively large page size, the TLB does not know if there are multiple translation entries stored in the TLB for smaller splintered pages of the relatively large page. The TLB tracks whether or not splintered pages for each translation context have been installed. If a TLB invalidate (TLBI) request is received, and splintered pages have not been installed, no searches are needed for splintered pages. To refresh the sticky bits, whenever a full TLB search is performed, the TLB rescans for splintered pages for other translation contexts. If no splintered pages are found, the sticky bit can be cleared and the number of full TLBI searches is reduced.

    Stack pointer instruction buffer for zero-cycle loads

    公开(公告)号:US11900118B1

    公开(公告)日:2024-02-13

    申请号:US17817866

    申请日:2022-08-05

    Applicant: Apple Inc.

    CPC classification number: G06F9/3814 G06F9/30134 G06F9/3826 G06F9/3838

    Abstract: An apparatus includes a rescue buffer circuit, a store queue circuit, and a control circuit. The rescue buffer circuit may be configured to retain address information related to store instructions. The store queue circuit may be configured to buffer dependency information related to a particular store instruction until the particular store instruction is released to be executed. The control circuit may be configured to cause a subset of the dependency information for the particular store instruction to be written to the rescue buffer circuit. The rescue buffer circuit may be configured to retain the subset after the dependency information has been released from the store queue circuit, and to perform a subsequent load instruction corresponding to a memory location associated with the particular store instruction using the subset of the dependency information from the rescue buffer circuit.

    Coprocessor Prefetcher
    7.
    发明申请

    公开(公告)号:US20230092898A1

    公开(公告)日:2023-03-23

    申请号:US17643765

    申请日:2021-12-10

    Applicant: Apple Inc.

    Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.

    Load Instruction Fusion
    8.
    发明公开

    公开(公告)号:US20240329988A1

    公开(公告)日:2024-10-03

    申请号:US18739070

    申请日:2024-06-10

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed that relate to executing fused instructions. A processor may include a decoder circuit and a load/store circuit. The decoder circuit may detect a load/store instruction to load a value from a memory and detect a non-load/store instruction that depends on the value to be loaded. The decoder circuit may fuse the load/store instruction and the non-load/store instruction such that one or more operations that the non-load/store instruction is defined to perform are to be executed within the load/store circuit. The load/store circuit may receive an indication of the fused load/store and non-load/store instructions and then execute one or more operations of the load/store instruction and the one or more operations of the non-load/store instruction using a circuit included in the load/store circuit.

    Coprocessor prefetcher
    9.
    发明授权

    公开(公告)号:US12050918B2

    公开(公告)日:2024-07-30

    申请号:US18361244

    申请日:2023-07-28

    Applicant: Apple Inc.

    CPC classification number: G06F9/3881 G06F9/382 G06F9/383 G06F9/3877

    Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.

    Coprocessor Prefetcher
    10.
    发明公开

    公开(公告)号:US20240095037A1

    公开(公告)日:2024-03-21

    申请号:US18361244

    申请日:2023-07-28

    Applicant: Apple Inc.

    CPC classification number: G06F9/3881 G06F9/382 G06F9/383 G06F9/3877

    Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.

Patent Agency Ranking