Victim allocations in shared system cache

    公开(公告)号:US10963392B1

    公开(公告)日:2021-03-30

    申请号:US16048645

    申请日:2018-07-30

    申请人: Apple Inc.

    摘要: A system and method for efficiently handling data selected for eviction in a computing system. In various embodiments, a computing system includes one or more processors, a system memory, and a victim cache. The cache controller of a particular cache in a cache memory subsystem includes an allocator for determining whether to allocate data evicted from the particular cache into the victim cache. The data fetched into the first cache includes data fetched to service miss requests, which includes demand requests and prefetch requests. To determine whether to allocate, the allocator determines whether a usefulness of data fetched into the particular cache exceeds a threshold. If so, the evicted data is stored in the victim cache. If not, the evicted data bypasses the victim cache. Data determined to be accessed by a processor is deemed to be of a higher usefulness.

    Cache dependency handling
    3.
    发明授权

    公开(公告)号:US10127153B1

    公开(公告)日:2018-11-13

    申请号:US14868245

    申请日:2015-09-28

    申请人: Apple Inc.

    摘要: Techniques are disclosed relating to managing data-request dependencies for a cache. In one embodiment, an integrated circuit is disclosed that includes a plurality of requesting agents and a cache. The cache is configured to receive read and write requests from the plurality of requesting agents including a first request and a second request. The cache is configured to detect that the first and second requests specify addresses that correspond to different portions of the same cache line, and to determine whether to delay processing one of the first and second requests based on whether the first and second requests are from the same requesting agent. In some embodiments, the cache is configured to service the first and second requests in parallel in response to determining that the first and second requests are from the same requesting agent.

    Arithmetic branch fusion
    4.
    发明授权

    公开(公告)号:US09672037B2

    公开(公告)日:2017-06-06

    申请号:US13747977

    申请日:2013-01-23

    申请人: Apple Inc.

    IPC分类号: G06F9/30 G06F9/38

    摘要: A processor and method for fusing together an arithmetic instruction and a branch instruction. The processor includes an instruction fetch unit configured to fetch instructions. The processor may also include an instruction decode unit that may be configured to decode the fetched instructions into micro-operations for execution by an execution unit. The decode unit may be configured to detect an occurrence of an arithmetic instruction followed by a branch instruction in program order, wherein the branch instruction, upon execution, changes a program flow of control dependent upon a result of execution of the arithmetic instruction. In addition, the processor may further be configured to fuse together the arithmetic instruction and the branch instruction such that a single micro-operation is formed. The single micro-operation includes execution information based upon both the arithmetic instruction and the branch instruction.

    Reducing latency for pointer chasing loads

    公开(公告)号:US09710268B2

    公开(公告)日:2017-07-18

    申请号:US14264789

    申请日:2014-04-29

    申请人: Apple Inc.

    IPC分类号: G06F9/38 G06F9/30

    摘要: Systems, methods, and apparatuses for reducing the load to load/store address latency in an out-of-order processor. When a producer load is detected in the processor pipeline, the processor predicts whether the producer load is going to hit in the store queue. If the producer load is predicted not to hit in the store queue, then a dependent load or store can be issued early. The result data of the producer load is then bypassed forward from the data cache directly to the address generation unit. This result data is then used to generate an address for the dependent load or store, reducing the latency of the dependent load or store by one clock cycle.

    Usefulness indication for indirect branch prediction training
    9.
    发明授权
    Usefulness indication for indirect branch prediction training 有权
    间接分支预测训练的实用指标

    公开(公告)号:US09311100B2

    公开(公告)日:2016-04-12

    申请号:US13735694

    申请日:2013-01-07

    申请人: Apple Inc.

    IPC分类号: G06F9/00 G06F9/38 G06F9/30

    摘要: A circuit for implementing a branch target buffer. The branch target buffer may include a memory that stores a plurality of entries. Each entry may include a tag value, a target value, and a prediction accuracy value. A received index value corresponding to an indirect branch instruction may be used to select one of entries of the plurality of entries, and a received tag value may then be compared to the tag value of the selected entries in the memory. An entry in the memory may be selected in response to a determination that the received tag does not match the tag value of compared entries. The selected entry may be allocated to the indirect instruction branch dependent upon the prediction accuracy values of the plurality of entries.

    摘要翻译: 用于实现分支目标缓冲器的电路。 分支目标缓冲器可以包括存储多个条目的存储器。 每个条目可以包括标签值,目标值和预测精度值。 对应于间接分支指令的接收到的索引值可以用于选择多个条目中的一个条目,然后将接收到的标签值与存储器中所选条目的标签值进行比较。 响应于接收到的标签与被比较的条目的标签值不匹配的确定,可以选择存储器中的条目。 所选择的条目可以根据多个条目的预测精度值分配给间接指令分支。

    Arithmetic Branch Fusion
    10.
    发明申请
    Arithmetic Branch Fusion 有权
    算术分支融合

    公开(公告)号:US20140208073A1

    公开(公告)日:2014-07-24

    申请号:US13747977

    申请日:2013-01-23

    申请人: APPLE INC.

    IPC分类号: G06F9/30

    摘要: A processor and method for fusing together an arithmetic instruction and a branch instruction. The processor includes an instruction fetch unit configured to fetch instructions. The processor may also include an instruction decode unit that may be configured to decode the fetched instructions into micro-operations for execution by an execution unit. The decode unit may be configured to detect an occurrence of an arithmetic instruction followed by a branch instruction in program order, wherein the branch instruction, upon execution, changes a program flow of control dependent upon a result of execution of the arithmetic instruction. In addition, the processor may further be configured to fuse together the arithmetic instruction and the branch instruction such that a single micro-operation is formed. The single micro-operation includes execution information based upon both the arithmetic instruction and the branch instruction.

    摘要翻译: 一种用于将算术指令和分支指令融合在一起的处理器和方法。 处理器包括被配置为提取指令的指令获取单元。 处理器还可以包括指令解码单元,其可被配置为将获取的指令解码为微执行以由执行单元执行。 解码单元可以被配置为以程序顺序检测随后是分支指令的算术指令的发生,其中分支指令在执行时根据算术指令的执行结果改变程序控制流程。 此外,处理器还可以被配置为将算术指令和分支指令融合在一起,使得形成单个微操作。 单个微操作包括基于算术指令和分支指令的执行信息。