Unified prefetch circuit for multi-level caches

    公开(公告)号:US10180905B1

    公开(公告)日:2019-01-15

    申请号:US15093213

    申请日:2016-04-07

    Applicant: Apple Inc.

    Abstract: In an embodiment, a processor may implement an access map-pattern match (AMPM)-based prefetch circuit for a multi-level cache system. The access patterns that are matched to the access maps may include prefetches for different cache levels. Centralizing the generation of prefetches into one prefetch circuit may provide better observability and controllability of prefetching at various levels of the cache hierarchy, in an embodiment. Prefetches at different levels may be controlled individually based on the accuracy of those prefetches, in an embodiment. Additionally, in an embodiment, access patterns that are longer that a given threshold may have the granularity of the prefetches change so that more data is prefetched and the prefetches occur farther in advance, in some embodiments.

    Processor Including Multiple Dissimilar Processor Cores that Implement Different Portions of Instruction Set Architecture
    33.
    发明申请
    Processor Including Multiple Dissimilar Processor Cores that Implement Different Portions of Instruction Set Architecture 有权
    处理器包括实现不同部分的指令集架构的多个不相似的处理器内核

    公开(公告)号:US20160147290A1

    公开(公告)日:2016-05-26

    申请号:US14548912

    申请日:2014-11-20

    Applicant: Apple Inc.

    Abstract: In an embodiment, an integrated circuit may include one or more processors. Each processor may include multiple processor cores, and each core has a different design/implementation and performance level. For example, a core may be implemented for high performance, and another core may be implemented at a lower maximum performance, but may be optimized for efficiency. Additionally, in some embodiments, some features of the instruction set architecture implemented by the processor may be implemented in only one of the cores that make up the processor. If such a feature is invoked by a code sequence while a different core is active, the processor may swap cores to the core the implements the feature. Alternatively, an exception may be taken and an exception handler may be executed to identify the feature and activate the corresponding core.

    Abstract translation: 在一个实施例中,集成电路可以包括一个或多个处理器。 每个处理器可以包括多个处理器核心,并且每个核心具有不同的设计/实现和性能水平。 例如,可以实现用于高性能的核心,并且可以以较低的最大性能来实现另一个核心,但是可以针对效率进行优化。 另外,在一些实施例中,由处理器实现的指令集架构的一些特征可以仅在构成处理器的一个核中实现。 如果在不同核心处于活动状态时由代码序列调用这样的特征,则处理器可以将核心交换到核心来实现该特征。 或者,可以采取异常并且可以执行异常处理程序来识别特征并激活相应的核。

    Processor Including Multiple Dissimilar Processor Cores
    34.
    发明申请
    Processor Including Multiple Dissimilar Processor Cores 有权
    包括多个不相似处理器内核的处理器

    公开(公告)号:US20160147289A1

    公开(公告)日:2016-05-26

    申请号:US14548872

    申请日:2014-11-20

    Applicant: Apple Inc.

    Abstract: In an embodiment, an integrated circuit may include one or more processors. Each processor may include multiple processor cores, and each core has a different design/implementation and performance level. For example, a core may be implemented for high performance, but may have higher minimum voltage at which it operates correctly. Another core may be implemented at a lower maximum performance, but may be optimized for efficiency and may operate correctly at a lower minimum voltage. The processor may support multiple processor states (PStates). Each PState may specify an operating point and may be mapped to one of the processor cores. During operation, one of the cores is active: the core to which the current PState is mapped. If a new PState is selected and is mapped to a different core, the processor may automatically context switch the processor state to the newly-selected core and may begin execution on that core.

    Abstract translation: 在一个实施例中,集成电路可以包括一个或多个处理器。 每个处理器可以包括多个处理器核心,并且每个核心具有不同的设计/实现和性能水平。 例如,可以实现用于高性能的核,但是可以具有较高的最小电压,其正确地操作。 另一个核心可以以较低的最大性能来实现,但是可以针对效率进行优化,并且可以在较低的最小电压下正确地操作。 处理器可以支持多种处理器状态(PState)。 每个PState可以指定一个工作点,并且可以映射到一个处理器核心。 在运行期间,其中一个核心是活动的:当前PState映射到的核心。 如果选择新的PState并将其映射到不同的核心,则处理器可以自动地将处理器状态切换到新选择的核心,并且可以在该核心上开始执行。

    Branch Predictor for Wide Issue, Arbitrarily Aligned Fetch
    36.
    发明申请
    Branch Predictor for Wide Issue, Arbitrarily Aligned Fetch 审中-公开
    广泛问题的分支预测器,任意对齐获取

    公开(公告)号:US20160048395A1

    公开(公告)日:2016-02-18

    申请号:US14923947

    申请日:2015-10-27

    Applicant: Apple Inc.

    Abstract: In an embodiment, a processor may be configured to fetch N instruction bytes from an instruction cache (a “fetch group”), even if the fetch group crosses a cache line boundary. A branch predictor may be configured to produce branch predictions for up to M branches in the fetch group, where M is a maximum number of branches that may be included in the fetch group. In an embodiment, a branch direction predictor may be updated responsive to a misprediction and also responsive to the branch prediction being within a threshold of transitioning between predictions. To avoid a lookup to determine if the threshold update is to be performed, the branch predictor may detect the threshold update during prediction, and may transmit an indication with the branch.

    Abstract translation: 在一个实施例中,处理器可以被配置为从指令高速缓存(“取出组”)获取N个指令字节,即使获取组跨越高速缓存行边界。 分支预测器可以被配置为在获取组中产生多达M个分支的分支预测,其中M是可以包括在获取组中的最大分支数。 在一个实施例中,分支方向预测器可以响应于错误预测而被更新,并且还响应于在预测之间的转换阈值内的分支预测。 为了避免查找以确定是否要执行阈值更新,分支预测器可以在预测期间检测阈值更新,并且可以用分支发送指示。

    Least Recently Used Mechanism for Cache Line Eviction from a Cache Memory
    37.
    发明申请
    Least Recently Used Mechanism for Cache Line Eviction from a Cache Memory 有权
    最近使用缓存线缓存从缓存内存使用的机制

    公开(公告)号:US20150026404A1

    公开(公告)日:2015-01-22

    申请号:US13946327

    申请日:2013-07-19

    Applicant: Apple Inc.

    Abstract: A mechanism for evicting a cache line from a cache memory includes first selecting for eviction a least recently used cache line of a group of invalid cache lines. If all cache lines are valid, selecting for eviction a least recently used cache line of a group of cache lines in which no cache line of the group of cache lines is also stored within a higher level cache memory such as the L1 cache, for example. Lastly, if all cache lines are valid and there are no non-inclusive cache lines, selecting for eviction the least recently used cache line stored in the cache memory.

    Abstract translation: 用于从高速缓冲存储器中逐出高速缓存行的机制包括首先选择驱逐一组无效高速缓存行的最近最少使用的高速缓存行。 如果所有高速缓存行都有效,则选择驱逐,一组高速缓存行的最近最少使用的高速缓存行,其中该高速缓存行组中的高速缓存行也不存储在诸如L1高速缓存的更高级高速缓冲存储器中 。 最后,如果所有高速缓存行都是有效的,并且没有非包含的高速缓存行,则选择驱逐存储在高速缓冲存储器中的最近最少使用的高速缓存行。

    MULTI-CORE PROCESSOR INSTRUCTION THROTTLING
    38.
    发明申请
    MULTI-CORE PROCESSOR INSTRUCTION THROTTLING 有权
    多核处理器指导曲线

    公开(公告)号:US20140317425A1

    公开(公告)日:2014-10-23

    申请号:US13864723

    申请日:2013-04-17

    Applicant: APPLE INC.

    Abstract: An apparatus for performing instruction throttling for a multi-processor system is disclosed. The apparatus may include a power estimation circuit, a table, a comparator, and a finite state machine. The power estimation circuit may be configured to receive information on high power instructions issued to a first processor and a second processor, and generate a power estimate dependent upon the received information. The table may be configured to store one or more pre-determined power threshold values, and the comparator may be configured to compare the power estimate with at least one of the pre-determined power threshold values. The finite state machine may be configured to adjust the throttle level of the first and second processors dependent upon the result of the comparison.

    Abstract translation: 公开了一种用于执行多处理器系统的指令调节的装置。 该装置可以包括功率估计电路,表,比较器和有限状态机。 功率估计电路可以被配置为接收关于发给第一处理器和第二处理器的高功率指令的信息,并且根据所接收的信息生成功率估计。 该表可以被配置为存储一个或多个预定功率阈值,并且比较器可以被配置为将功率估计与预定功率阈值中的至少一个进行比较。 有限状态机可以被配置为根据比较的结果来调节第一和第二处理器的节气门位置。

    IT INSTRUCTION PRE-DECODE
    39.
    发明申请
    IT INSTRUCTION PRE-DECODE 有权
    IT指令预编译

    公开(公告)号:US20140244976A1

    公开(公告)日:2014-08-28

    申请号:US13774093

    申请日:2013-02-22

    Applicant: APPLE INC.

    Abstract: Various techniques for processing and pre-decoding branches within an IT instruction block. Instructions are fetched and cached in an instruction cache, and pre-decode bits are generated to indicate the presence of an IT instruction and the likely boundaries of the IT instruction block. If an unconditional branch is detected within the likely boundaries of an IT instruction block, the unconditional branch is treated as if it were a conditional branch. The unconditional branch is sent to the branch direction predictor and the predictor generates a branch direction prediction for the unconditional branch.

    Abstract translation: 用于在IT指令块内处理和预解码分支的各种技术。 指令被取出并缓存在指令高速缓存中,并且生成预解码位以指示IT指令的存在以及IT指令块的可能边界。 如果在IT指令块的可能边界内检测到无条件分支,则无条件分支被视为是条件分支。 无条件分支被发送到分支方向预测器,预测器产生无条件分支的分支方向预测。

    Usefulness Indication For Indirect Branch Prediction Training
    40.
    发明申请
    Usefulness Indication For Indirect Branch Prediction Training 有权
    间接分支预测训练的实用性指标

    公开(公告)号:US20140195789A1

    公开(公告)日:2014-07-10

    申请号:US13735694

    申请日:2013-01-07

    Applicant: APPLE INC.

    CPC classification number: G06F9/3844 G06F9/30072 G06F9/3806 G06F9/3848

    Abstract: A circuit for implementing a branch target buffer. The branch target buffer may include a memory that stores a plurality of entries. Each entry may include a tag value, a target value, and a prediction accuracy value. A received index value corresponding to an indirect branch instruction may be used to select one of entries of the plurality of entries, and a received tag value may then be compared to the tag value of the selected entries in the memory. An entry in the memory may be selected in response to a determination that the received tag does not match the tag value of compared entries. The selected entry may be allocated to the indirect instruction branch dependent upon the prediction accuracy values of the plurality of entries.

    Abstract translation: 用于实现分支目标缓冲器的电路。 分支目标缓冲器可以包括存储多个条目的存储器。 每个条目可以包括标签值,目标值和预测精度值。 对应于间接分支指令的接收到的索引值可以用于选择多个条目中的一个条目,然后将接收的标签值与存储器中所选条目的标签值进行比较。 响应于接收到的标签与被比较的条目的标签值不匹配的确定,可以选择存储器中的条目。 所选择的条目可以根据多个条目的预测精度值分配给间接指令分支。

Patent Agency Ranking