Prefetch circuit with global quality factor to reduce aggressiveness in low power modes

    公开(公告)号:US10331567B1

    公开(公告)日:2019-06-25

    申请号:US15435910

    申请日:2017-02-17

    Applicant: Apple Inc.

    Abstract: A prefetch circuit may include a memory, each entry of which may store an address and other prefetch data used to generate prefetch requests. For each entry, there may be at least one “quality factor” (QF) that may control prefetch request generation for that entry. A global quality factor (GQF) may control generation of prefetch requests across the plurality of entries. The prefetch circuit may include one or more additional prefetch mechanisms. For example, a stride-based prefetch circuit may be included that may generate prefetch requests for strided access patterns having strides larger than a certain stride size. Another example is a spatial memory streaming (SMS)-based mechanism in which prefetch data from multiple evictions from the memory in the prefetch circuit is captured and used for SMS prefetching based on how well the prefetch data appears to match a spatial memory streaming pattern.

    Access map-pattern match based prefetch unit for a processor
    12.
    发明授权
    Access map-pattern match based prefetch unit for a processor 有权
    为处理器访问基于地图模式匹配的预取单元

    公开(公告)号:US09015422B2

    公开(公告)日:2015-04-21

    申请号:US13942780

    申请日:2013-07-16

    Applicant: Apple Inc.

    CPC classification number: G06F12/0862 G06F2212/6026 Y02D10/13

    Abstract: In an embodiment, a processor may implement an access map-pattern match (AMPM)-based prefetcher in which patterns may include wild cards for some cache blocks. The wild card may match any access for the corresponding cache block (e.g. no access, demand access, prefetch, successful prefetch, etc.). Furthermore, patterns with irregular strides and/or irregular access patterns may be included in the matching patterns and may be detected for prefetch generation. In an embodiment, the AMPM prefetcher may implement a chained access map for large streaming prefetches. If a stream is detected, the AMPM prefetcher may allocate a pair of map entries for the stream and may reuse the pair for subsequent access map regions within the stream. In some embodiments, a quality factor may be associated with each access map and may control the rate of prefetch generation.

    Abstract translation: 在一个实施例中,处理器可以实现基于访问映射模式匹配(AMPM)的预取器,其中模式可以包括一些高速缓存块的通配符。 通配符可以匹配对应的高速缓存块的任何访问(例如,无访问,请求访问,预取,成功预取等)。 此外,具有不规则步幅和/或不规则访问模式的模式可以被包括在匹配模式中,并且可以被检测用于预取生成。 在一个实施例中,AMPM预取器可以实现用于大型流预取的链接访问映射。 如果检测到流,则AMPM预取器可以为流分配一对映射条目,并且可以将该对重新使用在该流内的后续访问映射区域。 在一些实施例中,质量因子可以与每个访问映射关联,并且可以控制预取生成的速率。

    GLOBAL MAINTENANCE COMMAND PROTOCOL IN A CACHE COHERENT SYSTEM
    13.
    发明申请
    GLOBAL MAINTENANCE COMMAND PROTOCOL IN A CACHE COHERENT SYSTEM 有权
    全球维护命令协议在高速缓存系统中

    公开(公告)号:US20140317358A1

    公开(公告)日:2014-10-23

    申请号:US13864670

    申请日:2013-04-17

    Applicant: APPLE INC.

    Abstract: A system may include a command queue controller coupled to a number of clusters of cores, where each cluster includes a cache shared amongst the cores. An originating core of one of the clusters may detect a global maintenance command and send the global maintenance command to the command queue controller. The command queue controller may broadcast the global maintenance command to the clusters including the originating core's cluster. Each of the cores of the clusters may execute the global maintenance command. Each cluster may send an acknowledgement to the command queue controller upon completed execution of the global maintenance command by each core of the cluster. The command queue controller may also send, upon receiving an acknowledgement from each cluster, a final acknowledgement to the originating core's cluster.

    Abstract translation: 系统可以包括耦合到多个核心集群的命令队列控制器,其中每个集群包括在核心之间共享的高速缓存。 其中一个集群的发起核心可以检测全局维护命令,并将全局维护命令发送到命令队列控制器。 命令队列控制器可以将全局维护命令广播到包括源核心集群在内的集群。 集群的每个核心都可以执行全局维护命令。 每个集群可以在集群的每个核心完成执行全局维护命令之后向命令队列控制器发送确认。 命令队列控制器还可以在收到来自每个集群的确认后,发送对发起核心集群的最终确认。

    Prefetch circuit for a processor with pointer optimization

    公开(公告)号:US09971694B1

    公开(公告)日:2018-05-15

    申请号:US14748833

    申请日:2015-06-24

    Applicant: Apple Inc.

    CPC classification number: G06F12/0862 G06F9/383 G06F2212/602 G06F2212/6028

    Abstract: In an embodiment, a processor may implement an access map-pattern match (AMPM)-based prefetch circuit with features designed to improve prefetching accuracy and/or reduce power consumption. In an embodiment, the prefetch circuit may be configured to detect that pointer reads are occurring (e.g. “pointer chasing.”) The prefetch circuit may be configured to increase the frequency at which prefetch requests are generated for an access map in which pointer read activity is detected, compared to the frequency at which the prefetch requests would be generated in the pointer read activity is not generated. In an embodiment, the prefetch circuit may also detect access maps that are store-only, and may reduce the frequency of prefetches for store only access maps as compared to the frequency of load-only or load/store maps.

    Reducing latency for pointer chasing loads

    公开(公告)号:US09710268B2

    公开(公告)日:2017-07-18

    申请号:US14264789

    申请日:2014-04-29

    Applicant: Apple Inc.

    CPC classification number: G06F9/30043 G06F9/3826 G06F9/3834 G06F9/3861

    Abstract: Systems, methods, and apparatuses for reducing the load to load/store address latency in an out-of-order processor. When a producer load is detected in the processor pipeline, the processor predicts whether the producer load is going to hit in the store queue. If the producer load is predicted not to hit in the store queue, then a dependent load or store can be issued early. The result data of the producer load is then bypassed forward from the data cache directly to the address generation unit. This result data is then used to generate an address for the dependent load or store, reducing the latency of the dependent load or store by one clock cycle.

    POINTER CHASING PREDICTION
    17.
    发明申请
    POINTER CHASING PREDICTION 有权
    指针变化预测

    公开(公告)号:US20140337581A1

    公开(公告)日:2014-11-13

    申请号:US13890716

    申请日:2013-05-09

    Applicant: APPLE INC.

    Inventor: Stephan G. Meier

    Abstract: A system and method for efficient scheduling of dependent load instructions. A processor includes both an execution core and a scheduler that issues instructions to the execution core. The execution core includes a load-store unit (LSU). The scheduler determines a first condition is satisfied, wherein the first condition comprises result data for a first load instruction is predicted eligible for LSU-internal forwarding. The scheduler determines a second condition is satisfied, wherein the second condition comprises a second load instruction younger in program order than the first load instruction is dependent on the first load instruction. In response to each of the first condition and the second condition being satisfied, the scheduler can issue the second load instruction earlier than it otherwise would. The LSU internally forwards the received result data from the first load instruction to address generation logic for the second load instruction.

    Abstract translation: 一种用于有效调度依赖负载指令的系统和方法。 处理器包括执行核心和向执行核心发出指令的调度器。 执行核心包括一个加载存储单元(LSU)。 调度器确定满足第一条件,其中第一条件包括用于第一加载指令的结果数据被预测为符合LSU内部转发的条件。 调度器确定满足第二条件,其中第二条件包括程序次序中的第二加载指令比第一加载指令取决于第一加载指令。 响应于满足第一条件和第二条件中的每一个,调度器可以比否则发出第二加载指令。 LSU将接收到的结果数据从第一加载指令内部转发到第二加载指令的地址生成逻辑。

    Secondary prefetch circuit that reports coverage to a primary prefetch circuit to limit prefetching by primary prefetch circuit

    公开(公告)号:US11176045B2

    公开(公告)日:2021-11-16

    申请号:US16832893

    申请日:2020-03-27

    Applicant: Apple Inc.

    Abstract: In an embodiment, a processor includes a plurality of prefetch circuits configured to prefetch data into a data cache. A primary prefetch circuit may be configured to generate first prefetch requests in response to a demand access, and may be configured to invoke a second prefetch circuit in response to the demand access. The second prefetch circuit may implement a different prefetch mechanism than the first prefetch circuit. If the second prefetch circuit reaches a threshold confidence level in prefetching for the demand access, the second prefetch circuit may communicate an indication to the primary prefetch circuit. The primary prefetch circuit may reduce a number of prefetch requests generated for the demand access responsive to the communication from the second prefetch circuit.

    Load/store dependency predictor optimization for replayed loads

    公开(公告)号:US10437595B1

    公开(公告)日:2019-10-08

    申请号:US15070435

    申请日:2016-03-15

    Applicant: Apple Inc.

    Abstract: Systems, apparatuses, and methods for optimizing a load-store dependency predictor (LSDP). When a younger load instruction is issued before an older store instruction and the younger load is dependent on the older store, the LSDP is trained on this ordering violation. A replay/flush indicator is stored in a corresponding entry in the LSDP to indicate whether the ordering violation resulted in a flush or replay. On subsequent executions, a dependency may be enforced for the load-store pair if a confidence counter is above a threshold, with the threshold varying based on the status of the replay/flush indicator. If a given load matches on multiple entries in the LSDP, and if at least one of the entries has a flush indicator, then the given load may be marked as a multimatch case and forced to wait to issue until all older stores have issued.

    Content-directed prefetch circuit with quality filtering

    公开(公告)号:US09886385B1

    公开(公告)日:2018-02-06

    申请号:US15247421

    申请日:2016-08-25

    Applicant: Apple Inc.

    Abstract: In a content-directed prefetcher, a pointer detection circuit identifies a given memory pointer candidate within a data cache line fill from a lower level cache (LLC), where the LLC is at a lower level of a memory hierarchy relative to the data cache. A pointer filter circuit initiates a prefetch request to the LLC candidate dependent on determining that a given counter in a quality factor (QF) table satisfies QF counter threshold value. The QF table is indexed dependent upon a program counter address and relative cache line offset of the candidate. Upon initiation of the prefetch request, the given counter is updated to reflect a prefetch cost. In response to determining that a subsequent data cache line fill arriving from the LLC corresponds to the prefetch request for the given memory pointer candidate, a particular counter of the QF table may be updated to reflect a successful prefetch credit.

Patent Agency Ranking