Multi-level dispatch for a superscalar processor
    2.
    发明授权
    Multi-level dispatch for a superscalar processor 有权
    超标量处理器的多级调度

    公开(公告)号:US09336003B2

    公开(公告)日:2016-05-10

    申请号:US13749999

    申请日:2013-01-25

    Applicant: Apple Inc.

    CPC classification number: G06F9/3836 G06F9/30145 G06F9/4881 G06F9/4887

    Abstract: In an embodiment, a processor includes a multi-level dispatch circuit configured to supply operations for execution by multiple parallel execution pipelines. The multi-level dispatch circuit may include multiple dispatch buffers, each of which is coupled to multiple reservation stations. Each reservation station may be coupled to a respective execution pipeline and may be configured to schedule instruction operations (ops) for execution in the respective execution pipeline. The sets of reservation stations coupled to each dispatch buffer may be non-overlapping. Thus, if a given op is to be executed in a given execution pipeline, the op may be sent to the dispatch buffer which is coupled to the reservation station that provides ops to the given execution pipeline.

    Abstract translation: 在一个实施例中,处理器包括被配置为提供由多个并行执行管线执行的操作的多级调度电路。 多级调度电路可以包括多个调度缓冲器,每个调度缓冲器耦合到多个保留站。 每个保留站可以耦合到相应的执行流水线,并且可以被配置为调度用于在相应的执行流水线中执行的指令操作(op)。 耦合到每个调度缓冲器的保留站组可以是不重叠的。 因此,如果在给定的执行流水线中执行给定的操作,则操作可以被发送到调度缓冲器,该调度缓冲器耦合到向给定的执行流水线提供操作的保留站。

    Next fetch predictor return address stack
    3.
    发明授权
    Next fetch predictor return address stack 有权
    下一个提取预测器返回地址堆栈

    公开(公告)号:US09405544B2

    公开(公告)日:2016-08-02

    申请号:US13893898

    申请日:2013-05-14

    Applicant: Apple Inc.

    CPC classification number: G06F9/3806 G06F9/30054 G06F9/382 G06F9/3848

    Abstract: A system and method for efficient branch prediction. A processor includes a next fetch predictor to generate a fast branch prediction for branch instructions at an early pipeline stage. The processor also includes a main return address stack (RAS) at a later pipeline stage for predicting the target of return instructions. When a return instruction is encountered, the prediction from the next fetch predictor is replaced by the top of the main RAS. If there are any recent call or return instructions in flight toward the main RAS, then a separate prediction is generated by a mini-RAS.

    Abstract translation: 一种有效的分支预测的系统和方法。 处理器包括下一个提取预测器,用于在早期流水线阶段生成分支指令的快速分支预测。 该处理器还包括在稍后流水线阶段的主返回地址堆栈(RAS),用于预测返回指令的目标。 当遇到返回指令时,来自下一个提取预测器的预测由主RAS的顶部代替。 如果飞行中有最近的呼叫或返回指令进入主RAS,则由小型RAS产生单独的预测。

    IT INSTRUCTION PRE-DECODE
    4.
    发明申请
    IT INSTRUCTION PRE-DECODE 有权
    IT指令预编译

    公开(公告)号:US20140244976A1

    公开(公告)日:2014-08-28

    申请号:US13774093

    申请日:2013-02-22

    Applicant: APPLE INC.

    Abstract: Various techniques for processing and pre-decoding branches within an IT instruction block. Instructions are fetched and cached in an instruction cache, and pre-decode bits are generated to indicate the presence of an IT instruction and the likely boundaries of the IT instruction block. If an unconditional branch is detected within the likely boundaries of an IT instruction block, the unconditional branch is treated as if it were a conditional branch. The unconditional branch is sent to the branch direction predictor and the predictor generates a branch direction prediction for the unconditional branch.

    Abstract translation: 用于在IT指令块内处理和预解码分支的各种技术。 指令被取出并缓存在指令高速缓存中,并且生成预解码位以指示IT指令的存在以及IT指令块的可能边界。 如果在IT指令块的可能边界内检测到无条件分支,则无条件分支被视为是条件分支。 无条件分支被发送到分支方向预测器,预测器产生无条件分支的分支方向预测。

    Usefulness Indication For Indirect Branch Prediction Training
    5.
    发明申请
    Usefulness Indication For Indirect Branch Prediction Training 有权
    间接分支预测训练的实用性指标

    公开(公告)号:US20140195789A1

    公开(公告)日:2014-07-10

    申请号:US13735694

    申请日:2013-01-07

    Applicant: APPLE INC.

    CPC classification number: G06F9/3844 G06F9/30072 G06F9/3806 G06F9/3848

    Abstract: A circuit for implementing a branch target buffer. The branch target buffer may include a memory that stores a plurality of entries. Each entry may include a tag value, a target value, and a prediction accuracy value. A received index value corresponding to an indirect branch instruction may be used to select one of entries of the plurality of entries, and a received tag value may then be compared to the tag value of the selected entries in the memory. An entry in the memory may be selected in response to a determination that the received tag does not match the tag value of compared entries. The selected entry may be allocated to the indirect instruction branch dependent upon the prediction accuracy values of the plurality of entries.

    Abstract translation: 用于实现分支目标缓冲器的电路。 分支目标缓冲器可以包括存储多个条目的存储器。 每个条目可以包括标签值,目标值和预测精度值。 对应于间接分支指令的接收到的索引值可以用于选择多个条目中的一个条目,然后将接收的标签值与存储器中所选条目的标签值进行比较。 响应于接收到的标签与被比较的条目的标签值不匹配的确定,可以选择存储器中的条目。 所选择的条目可以根据多个条目的预测精度值分配给间接指令分支。

    Arithmetic branch fusion
    6.
    发明授权

    公开(公告)号:US09672037B2

    公开(公告)日:2017-06-06

    申请号:US13747977

    申请日:2013-01-23

    Applicant: Apple Inc.

    Abstract: A processor and method for fusing together an arithmetic instruction and a branch instruction. The processor includes an instruction fetch unit configured to fetch instructions. The processor may also include an instruction decode unit that may be configured to decode the fetched instructions into micro-operations for execution by an execution unit. The decode unit may be configured to detect an occurrence of an arithmetic instruction followed by a branch instruction in program order, wherein the branch instruction, upon execution, changes a program flow of control dependent upon a result of execution of the arithmetic instruction. In addition, the processor may further be configured to fuse together the arithmetic instruction and the branch instruction such that a single micro-operation is formed. The single micro-operation includes execution information based upon both the arithmetic instruction and the branch instruction.

    RDA checkpoint optimization
    7.
    发明授权
    RDA checkpoint optimization 有权
    RDA检查点优化

    公开(公告)号:US09311084B2

    公开(公告)日:2016-04-12

    申请号:US13955847

    申请日:2013-07-31

    Applicant: Apple Inc.

    CPC classification number: G06F9/30032 G06F9/3838 G06F9/384 G06F9/3863

    Abstract: A system and method for efficiently performing microarchitectural checkpointing. A register rename unit within a processor determines whether a physical register number qualifies to have duplicate mappings. Information for maintenance of the duplicate mappings is stored in a register duplicate array (RDA). To reduce the penalty for misspeculation or exception recovery, control logic in the processor supports multiple checkpoints. The RDA is one of multiple data structures to have checkpoint copies of state. The RDA utilizes a content addressable memory (CAM) to store physical register numbers. The duplicate counts for both the current state and the checkpoint copies for a given physical register number are updated when instructions utilizing the given physical register number are retired. To reduce on-die real estate and power consumption, a single CAM entry is stores the physical register number and the other fields are stored in separate storage elements.

    Abstract translation: 一种有效执行微架构检查点的系统和方法。 处理器内的寄存器重命名单元确定物理寄存器号码是否有资格具有重复的映射。 维护重复映射的信息存储在寄存器重复数组(RDA)中。 为了减少错误或异常恢复的处罚,处理器中的控制逻辑支持多个检查点。 RDA是具有状态检查点副本的多个数据结构之一。 RDA利用内容可寻址存储器(CAM)来存储物理寄存器编号。 对于给定的物理寄存器号码的当前状态和检查点副本的重复计数将在使用给定物理寄存器号码的指令退出时更新。 为了降低裸片上的不动产和功耗,单个CAM条目存储物理寄存器号,其他字段存储在单独的存储元件中。

    Mechanism for reducing cache power consumption using cache way prediction
    8.
    发明授权
    Mechanism for reducing cache power consumption using cache way prediction 有权
    使用缓存方式预测降低缓存功耗的机制

    公开(公告)号:US09311098B2

    公开(公告)日:2016-04-12

    申请号:US13888551

    申请日:2013-05-07

    Applicant: Apple Inc.

    Abstract: A mechanism for reducing power consumption of a cache memory of a processor includes a processor with a cache memory that stores instruction information for one or more instruction fetch groups fetched from a system memory. The cache memory may include a number of ways that are each independently controllable. The processor also includes a way prediction unit. The way prediction unit may enable, in a next execution cycle, a given way within which instruction information corresponding to a target of a next branch instruction is stored in response to a branch taken prediction for the next branch instruction. The way prediction unit may also, in response to the branch taken prediction for the next branch instruction, enable, one at a time, each corresponding way within which instruction information corresponding to respective sequential instruction fetch groups that follow the next branch instruction are stored.

    Abstract translation: 用于降低处理器的高速缓冲存储器的功耗的机构包括具有高速缓存存储器的处理器,该高速缓冲存储器存储从系统存储器取出的一个或多个指令获取组的指令信息。 高速缓冲存储器可以包括各自独立可控的多种方式。 处理器还包括方式预测单元。 方式预测单元可以在下一个执行周期中使得响应于下一个分支指令的分支采取预测而存储对应于下一分支指令的目标的指令信息的给定方式。 方式预测单元还可以响应于对下一个分支指令的分支采取的预测,一次一个地使能存储与下一个分支指令之后的各个顺序指令获取组对应的指令信息的每个对应方式。

    Mechanism for Reducing Cache Power Consumption Using Cache Way Prediction
    9.
    发明申请
    Mechanism for Reducing Cache Power Consumption Using Cache Way Prediction 有权
    使用缓存方式预测降低高速缓存功耗的机制

    公开(公告)号:US20140337605A1

    公开(公告)日:2014-11-13

    申请号:US13888551

    申请日:2013-05-07

    Applicant: APPLE INC.

    Abstract: A mechanism for reducing power consumption of a cache memory of a processor includes a processor with a cache memory that stores instruction information for one or more instruction fetch groups fetched from a system memory. The cache memory may include a number of ways that are each independently controllable. The processor also includes a way prediction unit. The way prediction unit may enable, in a next execution cycle, a given way within which instruction information corresponding to a target of a next branch instruction is stored in response to a branch taken prediction for the next branch instruction. The way prediction unit may also, in response to the branch taken prediction for the next branch instruction, enable, one at a time, each corresponding way within which instruction information corresponding to respective sequential instruction fetch groups that follow the next branch instruction are stored.

    Abstract translation: 用于降低处理器的高速缓冲存储器的功耗的机构包括具有高速缓存存储器的处理器,该高速缓冲存储器存储从系统存储器取出的一个或多个指令获取组的指令信息。 高速缓冲存储器可以包括各自独立可控的多种方式。 处理器还包括方式预测单元。 方式预测单元可以在下一个执行周期中使得响应于下一个分支指令的分支采取预测而存储对应于下一分支指令的目标的指令信息的给定方式。 方式预测单元还可以响应于对下一个分支指令的分支采取的预测,一次一个地使能存储与下一个分支指令之后的各个顺序指令获取组对应的指令信息的每个对应方式。

    Usefulness indication for indirect branch prediction training
    10.
    发明授权
    Usefulness indication for indirect branch prediction training 有权
    间接分支预测训练的实用指标

    公开(公告)号:US09311100B2

    公开(公告)日:2016-04-12

    申请号:US13735694

    申请日:2013-01-07

    Applicant: Apple Inc.

    CPC classification number: G06F9/3844 G06F9/30072 G06F9/3806 G06F9/3848

    Abstract: A circuit for implementing a branch target buffer. The branch target buffer may include a memory that stores a plurality of entries. Each entry may include a tag value, a target value, and a prediction accuracy value. A received index value corresponding to an indirect branch instruction may be used to select one of entries of the plurality of entries, and a received tag value may then be compared to the tag value of the selected entries in the memory. An entry in the memory may be selected in response to a determination that the received tag does not match the tag value of compared entries. The selected entry may be allocated to the indirect instruction branch dependent upon the prediction accuracy values of the plurality of entries.

    Abstract translation: 用于实现分支目标缓冲器的电路。 分支目标缓冲器可以包括存储多个条目的存储器。 每个条目可以包括标签值,目标值和预测精度值。 对应于间接分支指令的接收到的索引值可以用于选择多个条目中的一个条目,然后将接收到的标签值与存储器中所选条目的标签值进行比较。 响应于接收到的标签与被比较的条目的标签值不匹配的确定,可以选择存储器中的条目。 所选择的条目可以根据多个条目的预测精度值分配给间接指令分支。

Patent Agency Ranking