Decoupling Atomicity from Operation Size

    公开(公告)号:US20210397555A1

    公开(公告)日:2021-12-23

    申请号:US16907740

    申请日:2020-06-22

    Applicant: Apple Inc.

    Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.

    Prefetch circuit with global quality factor to reduce aggressiveness in low power modes

    公开(公告)号:US10331567B1

    公开(公告)日:2019-06-25

    申请号:US15435910

    申请日:2017-02-17

    Applicant: Apple Inc.

    Abstract: A prefetch circuit may include a memory, each entry of which may store an address and other prefetch data used to generate prefetch requests. For each entry, there may be at least one “quality factor” (QF) that may control prefetch request generation for that entry. A global quality factor (GQF) may control generation of prefetch requests across the plurality of entries. The prefetch circuit may include one or more additional prefetch mechanisms. For example, a stride-based prefetch circuit may be included that may generate prefetch requests for strided access patterns having strides larger than a certain stride size. Another example is a spatial memory streaming (SMS)-based mechanism in which prefetch data from multiple evictions from the memory in the prefetch circuit is captured and used for SMS prefetching based on how well the prefetch data appears to match a spatial memory streaming pattern.

    Instruction fusion
    4.
    发明授权

    公开(公告)号:US12217060B1

    公开(公告)日:2025-02-04

    申请号:US18176457

    申请日:2023-02-28

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed that relate to executing pairs of instructions. A processor may include fusion detector circuitry configured to detect a pair of fetched instructions and fuse the pair of fetched instructions into a fused instruction operation, and execution circuitry coupled to the fusion detector circuitry and configured to execute the fused instruction operation. In some embodiments the pair of instructions is executable to generate a remainder of a division operation. In some embodiments the pair of instructions is executable to compare two operands and perform a write operation based on the comparison. In some embodiments the pair of instructions is executable to perform an operation and apply a mask bit sequence to the result. The fusion detector circuitry may also be configured to obtain first and second portions of a constant value from first and second instructions and store the first and second portions in a destination register.

    Decoupling Atomicity from Operation Size
    5.
    发明公开

    公开(公告)号:US20240248844A1

    公开(公告)日:2024-07-25

    申请号:US18587289

    申请日:2024-02-26

    Applicant: Apple Inc.

    Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.

    Load instruction fusion
    6.
    发明授权

    公开(公告)号:US12008369B1

    公开(公告)日:2024-06-11

    申请号:US17652501

    申请日:2022-02-25

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed that relate to executing fused instructions. A processor may include a decoder circuit and a load/store circuit. The decoder circuit may detect a load/store instruction to load a value from a memory and detect a non-load/store instruction that depends on the value to be loaded. The decoder circuit may fuse the load/store instruction and the non-load/store instruction such that one or more operations that the non-load/store instruction is defined to perform are to be executed within the load/store circuit. The load/store circuit may receive an indication of the fused load/store and non-load/store instructions and then execute one or more operations of the load/store instruction and the one or more operations of the non-load/store instruction using a circuit included in the load/store circuit.

    Stack pointer instruction buffer for zero-cycle loads

    公开(公告)号:US11900118B1

    公开(公告)日:2024-02-13

    申请号:US17817866

    申请日:2022-08-05

    Applicant: Apple Inc.

    CPC classification number: G06F9/3814 G06F9/30134 G06F9/3826 G06F9/3838

    Abstract: An apparatus includes a rescue buffer circuit, a store queue circuit, and a control circuit. The rescue buffer circuit may be configured to retain address information related to store instructions. The store queue circuit may be configured to buffer dependency information related to a particular store instruction until the particular store instruction is released to be executed. The control circuit may be configured to cause a subset of the dependency information for the particular store instruction to be written to the rescue buffer circuit. The rescue buffer circuit may be configured to retain the subset after the dependency information has been released from the store queue circuit, and to perform a subsequent load instruction corresponding to a memory location associated with the particular store instruction using the subset of the dependency information from the rescue buffer circuit.

    Load Instruction Fusion
    8.
    发明公开

    公开(公告)号:US20240329988A1

    公开(公告)日:2024-10-03

    申请号:US18739070

    申请日:2024-06-10

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed that relate to executing fused instructions. A processor may include a decoder circuit and a load/store circuit. The decoder circuit may detect a load/store instruction to load a value from a memory and detect a non-load/store instruction that depends on the value to be loaded. The decoder circuit may fuse the load/store instruction and the non-load/store instruction such that one or more operations that the non-load/store instruction is defined to perform are to be executed within the load/store circuit. The load/store circuit may receive an indication of the fused load/store and non-load/store instructions and then execute one or more operations of the load/store instruction and the one or more operations of the non-load/store instruction using a circuit included in the load/store circuit.

Patent Agency Ranking