-
公开(公告)号:US11416254B2
公开(公告)日:2022-08-16
申请号:US16705023
申请日:2019-12-05
Applicant: Apple Inc.
Inventor: Deepankar Duggal , Kulin N. Kothari , Conrado Blasco , Muawya M. Al-Otoom
Abstract: Systems, apparatuses, and methods for implementing zero cycle load bypass operations are described. A system includes a processor with at least a decode unit, control logic, mapper, and free list. When a load operation is detected, the control logic determines if the load operation qualifies to be converted to a zero cycle load bypass operation. Conditions for qualifying include the load operation being in the same decode group as an older store operation to the same address. Qualifying load operations are converted to zero cycle load bypass operations. A lookup of the free list is prevented for a zero cycle load bypass operation and a destination operand of the load is renamed with a same physical register identifier used for a source operand of the store. Also, the data of the store is bypassed to the load.
-
公开(公告)号:US09632791B2
公开(公告)日:2017-04-25
申请号:US14160242
申请日:2014-01-21
Applicant: Apple Inc.
Inventor: Muawya M. Al-Otoom , Ian D. Kountanis , Ronald P. Hall , Michael L. Karm
IPC: G06F12/08 , G06F9/38 , G06F12/0862
CPC classification number: G06F9/3844 , G06F9/3808 , G06F9/381 , G06F9/3867 , G06F12/0862 , Y02D10/13
Abstract: Techniques are disclosed relating to a cache for patterns of instructions. In some embodiments, an apparatus includes an instruction cache and is configured to detect a pattern of execution of instructions by an instruction processing pipeline. The pattern of execution may involve execution of only instructions in a particular group of instructions. The instructions may include multiple backward control transfers and/or a control transfer instruction that is taken in one iteration of the pattern and not taken in another iteration of the pattern. The apparatus may be configured to store the instructions in the instruction cache and fetch and execute the instructions from the instruction cache. The apparatus may include a branch predictor dedicated to predicting the direction of control transfer instructions for the instruction cache. Various embodiments may reduce power consumption associated with instruction processing.
-
公开(公告)号:US20230023860A1
公开(公告)日:2023-01-26
申请号:US17382123
申请日:2021-07-21
Applicant: Apple Inc.
Inventor: Douglas C. Holman , Ian D. Kountanis , Amit Kumar , Muawya M. Al-Otoom
Abstract: Techniques are disclosed relating to signature-based instruction prefetching. In some embodiments, processor pipeline circuitry executes a computer program that includes control transfer instructions, such that the execution follows a taken path through the computer program. First signature prefetch table circuitry indicates prefetch addresses for signatures generated using a first signature generation technique and second signature prefetch table circuitry that indicates prefetch addresses for signatures generated using a second, different signature generation technique. Signature prefetch circuitry, in response to a prefetch training event: determines a first signature according to the first technique and a second signature according to the second technique and selects one but not both of the first and second signature prefetch tables to train using the first signature or the second signature.
-
公开(公告)号:US11379240B2
公开(公告)日:2022-07-05
申请号:US16778939
申请日:2020-01-31
Applicant: Apple Inc.
Inventor: Muawya M. Al-Otoom , Ian D. Kountanis , Conrado Blasco , Haoyan Jia , Amit Kumar
IPC: G06F9/38
Abstract: In an embodiment, an indirect branch predictor generates indirect branch predictions based on one or more register values. The register values may be the contents of registers on which the indirect branch instruction is directly or indirectly dependent for generating the branch target address, for example. In an embodiment, at least one of the registers may be a source for a load instruction, and the indirect branch may be dependent (directly or indirectly) on the target of the load. In an embodiment, the indirect branch predictor may be one of at least two indirect branch predictors in a processor. The other indirect branch predictor may be based on a fetch address, or PC, associated with the indirect branch instruction. The other indirect branch predictor may generate a first predicted target address, and the indirect branch predictor may generate a second predicted target address for the same indirect branch instruction.
-
公开(公告)号:US20210064376A1
公开(公告)日:2021-03-04
申请号:US16551208
申请日:2019-08-26
Applicant: Apple Inc.
Inventor: Deepankar Duggal , Conrado Blasco , Muawya M. Al-Otoom , Richard F. Russo
IPC: G06F9/38
Abstract: Systems, apparatuses, and methods for implementing a physical register last reference scheme are described. A system includes a processor with a mapper, history file, and freelist. When an entry in the mapper is updated with a new architectural register-to-physical register mapping, the processor creates a new history file entry for the given instruction that caused the update. The processor also searches the mapper to determine if the old physical register that was previously stored in the mapper entry is referenced by any other mapper entries. If there are no other mapper entries that reference this old physical register, then a last reference indicator is stored in the new history file entry. When the given instruction retires, the processor checks the last reference indicator in the history file entry to determine whether the old physical register can be returned to the freelist of available physical registers.
-
公开(公告)号:US10719327B1
公开(公告)日:2020-07-21
申请号:US14716449
申请日:2015-05-19
Applicant: Apple Inc.
Inventor: Muawya M. Al-Otoom , Ian D. Kountanis , Conrado Blasco
Abstract: In some embodiments, a branch prediction unit includes a plurality of branch prediction circuits and selection logic. At least two of the branch prediction circuits are configured, based on an address of a branch instruction and different sets of history information, to provide a corresponding branch prediction for the branch instruction. At least one storage element of the at least two branch prediction circuits is set associative. The selection logic is configured to select a particular branch prediction output by one of the branch prediction circuits as a current branch prediction output of the branch prediction unit. In some instances, the branch prediction unit may be less likely to replace branch prediction information, as compared to a different branch prediction unit that does not include a set associative storage element. In some embodiments, this arrangement may lead to increased performance of the branch prediction unit.
-
公开(公告)号:US12236244B1
公开(公告)日:2025-02-25
申请号:US17810253
申请日:2022-06-30
Applicant: Apple Inc.
Inventor: Wei-Han Lien , Muawya M. Al-Otoom , Ian D. Kountanis , Niket K. Choudhary , Pruthivi Vuyyuru
IPC: G06F9/38
Abstract: A multi-degree branch predictor is disclosed. A processing circuit includes an instruction fetch circuit configured to fetch branch instructions, and a branch prediction circuit having a plurality of prediction subcircuits. The prediction subcircuits are configured to store different amounts of branch history data with respect to other ones, and to receive an indication of a given branch instruction in a particular clock cycle. The prediction subcircuits implement a common branch prediction scheme to output, in different clock cycles, corresponding predictions for the given branch instruction using the different amounts of branch history data and cause, instruction fetches to be performed by the instruction fetch circuit. The prediction subcircuits are also configured to override, in subsequent clock cycles, instruction fetches caused by prediction subcircuits with comparatively less branch history data based on contrary predictions performed in subsequent clock cycles by prediction subcircuits with more branch history data.
-
公开(公告)号:US20250021338A1
公开(公告)日:2025-01-16
申请号:US18352326
申请日:2023-07-14
Applicant: Apple Inc.
Inventor: Muawya M. Al-Otoom , Niket K. Choudhary , Pruthivi Vuyyuru
IPC: G06F9/38
Abstract: Disclosed techniques relate to next fetch predictor circuitry configured to operate in conjunction with a trace cache. The trace cache circuitry may identify and store traces of instructions based on predicted directions of one or more control transfer instructions. Trace next fetch predictor circuitry may predict a next fetch address based on a current fetch address for a current cycle, which may include predicting a next fetch address following execution of a first trace stored in the trace cache circuitry. The first trace may include multiple fetch groups and multiple control transfer instructions. Arbitration circuitry may select from among multiple predictors and the trace next fetch predictor may have priority in response to a trace cache hit. Disclosed techniques may advantageously improve overall fetch bandwidth in the context of trace cache hits.
-
公开(公告)号:US20250021333A1
公开(公告)日:2025-01-16
申请号:US18352323
申请日:2023-07-14
Applicant: Apple Inc.
Inventor: Ilhyun Kim , Niket K. Choudhary , Muawya M. Al-Otoom , Pruthivi Vuyyuru , Ronald P. Hall
Abstract: Disclosed techniques relate to trace caches. Trace cache circuitry may identify traces that satisfy one or more criteria. Generally, internal branches of a trace should satisfy a threshold bias level in a particular direction. To achieve this goal, the processor may initially assume that branches meet the threshold, track their usefulness in the trace context over time, and prevent inclusion of branches that fall below a usefulness threshold (which indicates that those branches are not sufficiently biased). Branches that do not meet the threshold may be added to a Bloom filter, for example. Usefulness may be tracked during trace training, when valid in a trace cache, or both.
-
公开(公告)号:US11200062B2
公开(公告)日:2021-12-14
申请号:US16551208
申请日:2019-08-26
Applicant: Apple Inc.
Inventor: Deepankar Duggal , Conrado Blasco , Muawya M. Al-Otoom , Richard F. Russo
IPC: G06F9/38
Abstract: Systems, apparatuses, and methods for implementing a physical register last reference scheme are described. A system includes a processor with a mapper, history file, and freelist. When an entry in the mapper is updated with a new architectural register-to-physical register mapping, the processor creates a new history file entry for the given instruction that caused the update. The processor also searches the mapper to determine if the old physical register that was previously stored in the mapper entry is referenced by any other mapper entries. If there are no other mapper entries that reference this old physical register, then a last reference indicator is stored in the new history file entry. When the given instruction retires, the processor checks the last reference indicator in the history file entry to determine whether the old physical register can be returned to the freelist of available physical registers.
-
-
-
-
-
-
-
-
-