-
公开(公告)号:US11914511B2
公开(公告)日:2024-02-27
申请号:US16907740
申请日:2020-06-22
Applicant: Apple Inc.
Inventor: Francesco Spadini , Gideon Levinsky , Mridul Agarwal
IPC: G06F12/0804 , G06F9/30 , G06F9/38
CPC classification number: G06F12/0804 , G06F9/30043 , G06F9/3826 , G06F9/3834 , G06F2212/601
Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.
-
公开(公告)号:US20210397555A1
公开(公告)日:2021-12-23
申请号:US16907740
申请日:2020-06-22
Applicant: Apple Inc.
Inventor: Francesco Spadini , Gideon Levinsky , Mridul Agarwal
IPC: G06F12/0804 , G06F9/30
Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.
-
公开(公告)号:US10331567B1
公开(公告)日:2019-06-25
申请号:US15435910
申请日:2017-02-17
Applicant: Apple Inc.
Inventor: Stephan G. Meier , Tyler J. Huberty , Nikhil Gupta , Francesco Spadini , Gideon Levinsky
IPC: G06F12/08 , G06F12/0862 , G06F12/12
Abstract: A prefetch circuit may include a memory, each entry of which may store an address and other prefetch data used to generate prefetch requests. For each entry, there may be at least one “quality factor” (QF) that may control prefetch request generation for that entry. A global quality factor (GQF) may control generation of prefetch requests across the plurality of entries. The prefetch circuit may include one or more additional prefetch mechanisms. For example, a stride-based prefetch circuit may be included that may generate prefetch requests for strided access patterns having strides larger than a certain stride size. Another example is a spatial memory streaming (SMS)-based mechanism in which prefetch data from multiple evictions from the memory in the prefetch circuit is captured and used for SMS prefetching based on how well the prefetch data appears to match a spatial memory streaming pattern.
-
公开(公告)号:US12217060B1
公开(公告)日:2025-02-04
申请号:US18176457
申请日:2023-02-28
Applicant: Apple Inc.
Inventor: Francesco Spadini , Skanda K. Srinivasa , Reena Panda , Brian T. Mokrzycki , Haoyan Jia , Zhaoxiang Jin
IPC: G06F9/30
Abstract: Techniques are disclosed that relate to executing pairs of instructions. A processor may include fusion detector circuitry configured to detect a pair of fetched instructions and fuse the pair of fetched instructions into a fused instruction operation, and execution circuitry coupled to the fusion detector circuitry and configured to execute the fused instruction operation. In some embodiments the pair of instructions is executable to generate a remainder of a division operation. In some embodiments the pair of instructions is executable to compare two operands and perform a write operation based on the comparison. In some embodiments the pair of instructions is executable to perform an operation and apply a mask bit sequence to the result. The fusion detector circuitry may also be configured to obtain first and second portions of a constant value from first and second instructions and store the first and second portions in a destination register.
-
公开(公告)号:US20240248844A1
公开(公告)日:2024-07-25
申请号:US18587289
申请日:2024-02-26
Applicant: Apple Inc.
Inventor: Francesco Spadini , Gideon Levinsky , Mridul Agarwal
IPC: G06F12/0804 , G06F9/30 , G06F9/38
CPC classification number: G06F12/0804 , G06F9/30043 , G06F9/3826 , G06F9/3834 , G06F2212/601
Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.
-
公开(公告)号:US12008369B1
公开(公告)日:2024-06-11
申请号:US17652501
申请日:2022-02-25
Applicant: Apple Inc.
Inventor: John D. Pape , Skanda K. Srinivasa , Francesco Spadini , Brian T. Mokrzycki
CPC classification number: G06F9/30043 , G06F9/3001 , G06F9/30058 , G06F9/3016 , G06F9/30185 , G06F9/3838 , G06F9/3858 , G06F9/3861
Abstract: Techniques are disclosed that relate to executing fused instructions. A processor may include a decoder circuit and a load/store circuit. The decoder circuit may detect a load/store instruction to load a value from a memory and detect a non-load/store instruction that depends on the value to be loaded. The decoder circuit may fuse the load/store instruction and the non-load/store instruction such that one or more operations that the non-load/store instruction is defined to perform are to be executed within the load/store circuit. The load/store circuit may receive an indication of the fused load/store and non-load/store instructions and then execute one or more operations of the load/store instruction and the one or more operations of the non-load/store instruction using a circuit included in the load/store circuit.
-
公开(公告)号:US11900118B1
公开(公告)日:2024-02-13
申请号:US17817866
申请日:2022-08-05
Applicant: Apple Inc.
Inventor: John D. Pape , Francesco Spadini , Zhaoxiang Jin
CPC classification number: G06F9/3814 , G06F9/30134 , G06F9/3826 , G06F9/3838
Abstract: An apparatus includes a rescue buffer circuit, a store queue circuit, and a control circuit. The rescue buffer circuit may be configured to retain address information related to store instructions. The store queue circuit may be configured to buffer dependency information related to a particular store instruction until the particular store instruction is released to be executed. The control circuit may be configured to cause a subset of the dependency information for the particular store instruction to be written to the rescue buffer circuit. The rescue buffer circuit may be configured to retain the subset after the dependency information has been released from the store queue circuit, and to perform a subsequent load instruction corresponding to a memory location associated with the particular store instruction using the subset of the dependency information from the rescue buffer circuit.
-
公开(公告)号:US20240329988A1
公开(公告)日:2024-10-03
申请号:US18739070
申请日:2024-06-10
Applicant: Apple Inc.
Inventor: John D. Pape , Skanda K. Srinivasa , Francesco Spadini , Brian T. Mokrzycki
CPC classification number: G06F9/30043 , G06F9/3001 , G06F9/30058 , G06F9/3016 , G06F9/30185 , G06F9/3838 , G06F9/3858 , G06F9/3861
Abstract: Techniques are disclosed that relate to executing fused instructions. A processor may include a decoder circuit and a load/store circuit. The decoder circuit may detect a load/store instruction to load a value from a memory and detect a non-load/store instruction that depends on the value to be loaded. The decoder circuit may fuse the load/store instruction and the non-load/store instruction such that one or more operations that the non-load/store instruction is defined to perform are to be executed within the load/store circuit. The load/store circuit may receive an indication of the fused load/store and non-load/store instructions and then execute one or more operations of the load/store instruction and the one or more operations of the non-load/store instruction using a circuit included in the load/store circuit.
-
-
-
-
-
-
-