Translation lookaside buffer invalidation by range

    公开(公告)号:US10725928B1

    公开(公告)日:2020-07-28

    申请号:US16243901

    申请日:2019-01-09

    Applicant: Apple Inc.

    Abstract: A system and method for efficiently performing maintenance on a cache. In various embodiments, control logic in a cache controller or elsewhere receives an indication for invalidating a range of virtual-to-physical mappings in a given translation lookaside buffer (TLB). The logic determines a first latency to invalidate entries of the TLB based on a number of addresses in the range and a number of supported page sizes simultaneously stored in the TLB. The logic determines a second latency based on a number of entries in the TLB. If the first latency is greater, then the logic traverses through each TLB entry and invalidates TLB entries storing a virtual address within the range. If the first latency is smaller, then the logic traverses through each address in the range and invalidates TLB entries storing a virtual address within the range.

    Translation Lookaside Buffer Invalidation By Range

    公开(公告)号:US20200218663A1

    公开(公告)日:2020-07-09

    申请号:US16243901

    申请日:2019-01-09

    Applicant: Apple Inc.

    Abstract: A system and method for efficiently performing maintenance on a cache. In various embodiments, control logic in a cache controller or elsewhere receives an indication for invalidating a range of virtual-to-physical mappings in a given translation lookaside buffer (TLB). The logic determines a first latency to invalidate entries of the TLB based on a number of addresses in the range and a number of supported page sizes simultaneously stored in the TLB. The logic determines a second latency based on a number of entries in the TLB. If the first latency is greater, then the logic traverses through each TLB entry and invalidates TLB entries storing a virtual address within the range. If the first latency is smaller, then the logic traverses through each address in the range and invalidates TLB entries storing a virtual address within the range.

    Load-store unit with banked queue

    公开(公告)号:US10133571B1

    公开(公告)日:2018-11-20

    申请号:US15171369

    申请日:2016-06-02

    Applicant: Apple Inc.

    Abstract: A load-store unit having one or more banked queues is disclosed. In one embodiment, a load-store unit includes at least one queue that is subdivided into multiple banks. Although divided into multiple banks, the queue logically appears to software as a single queue. A first bank of the queue includes a first plurality of entries, with the second bank of the queue having a second plurality of entries, wherein each of the entries is arranged to store memory instructions. Each of the banks is associated with corresponding logic circuitry that controls one or more pointers for that bank. The pointer information may be exchanged between the logic circuits associated with the banks. Based on the pointer information that is exchanged, each bank may output (e.g., for retirement) one entry per cycle.

    Reducing latency for pointer chasing loads

    公开(公告)号:US09710268B2

    公开(公告)日:2017-07-18

    申请号:US14264789

    申请日:2014-04-29

    Applicant: Apple Inc.

    CPC classification number: G06F9/30043 G06F9/3826 G06F9/3834 G06F9/3861

    Abstract: Systems, methods, and apparatuses for reducing the load to load/store address latency in an out-of-order processor. When a producer load is detected in the processor pipeline, the processor predicts whether the producer load is going to hit in the store queue. If the producer load is predicted not to hit in the store queue, then a dependent load or store can be issued early. The result data of the producer load is then bypassed forward from the data cache directly to the address generation unit. This result data is then used to generate an address for the dependent load or store, reducing the latency of the dependent load or store by one clock cycle.

    LOAD ORDERING IN A WEAKLY-ORDERED PROCESSOR
    5.
    发明申请
    LOAD ORDERING IN A WEAKLY-ORDERED PROCESSOR 有权
    在弱点处理器中订货

    公开(公告)号:US20140215191A1

    公开(公告)日:2014-07-31

    申请号:US13750972

    申请日:2013-01-25

    Applicant: APPLE INC.

    CPC classification number: G06F9/30043 G06F9/3834

    Abstract: Techniques are disclosed relating to ordering of load instructions in a weakly-ordered memory model. In one embodiment, a processor includes a cache with multiple cache lines and a store queue configured to maintain status information associated with a store instruction that targets a location in one of the cache lines. In this embodiment, the processor is configured to set an indicator in the status information in response to migration of the targeted cache line. The indicator may be usable to sequence performance of load instructions that are younger than the store instruction. For example, the processor may be configured to wait, based on the indicator, to perform a younger load instruction that targets the same location as the store instruction until the store instruction is removed from the store queue. This may prevent forwarding of the value of the store instruction to the younger load and preserve load-load ordering.

    Abstract translation: 公开了关于弱有序存储器模型中的加载指令的排序的技术。 在一个实施例中,处理器包括具有多个高速缓存行的高速缓存和存储队列,该存储队列被配置为维护与存储指令相关联的状态信息,所述存储指令针对高速缓存行之一中的位置 在该实施例中,处理器被配置为响应于目标高速缓存线的迁移而将状态信息中的指示符设置成。 该指示符可用于对比小于存储指令的加载指令的性能进行排序。 例如,处理器可以被配置为基于指示符等待执行与存储指令相同的位置的较年轻的加载指令,直到存储指令从存储队列中移除。 这可能会阻止将存储指令的值转发到较小的负载并保持负载负载顺序。

    COMPLETING LOAD AND STORE INSTRUCTIONS IN A WEAKLY-ORDERED MEMORY MODEL
    6.
    发明申请
    COMPLETING LOAD AND STORE INSTRUCTIONS IN A WEAKLY-ORDERED MEMORY MODEL 有权
    在一个令人担忧的内存模型中完成载入和存储指令

    公开(公告)号:US20140215190A1

    公开(公告)日:2014-07-31

    申请号:US13750942

    申请日:2013-01-25

    Applicant: APPLE INC.

    Abstract: Techniques are disclosed relating to completion of load and store instructions in a weakly-ordered memory model. In one embodiment, a processor includes a load queue and a store queue and is configured to associate queue information with a load instruction in an instruction stream. In this embodiment, the queue information indicates a location of the load instruction in the load queue and one or more locations in the store queue that are associated with one or more store instructions that are older than the load instruction. The processor may determine, using the queue information, that the load instruction does not conflict with a store instruction in the store queue that is older than the load instruction. The processor may remove the load instruction from the load queue while the store instruction remains in the store queue. The queue information may include a wrap value for the load queue.

    Abstract translation: 公开了在弱有序存储器模型中完成负载和存储指令的技术。 在一个实施例中,处理器包括加载队列和存储队列,并且被配置为将队列信息与指令流中的加载指令相关联。 在该实施例中,队列信息指示加载队列中的加载指令的位置和存储队列中与一个或多个比加载指令更早的存储指令相关联的一个或多个位置。 处理器可以使用队列信息来确定加载指令不与存储队列中比加载指令更早的存储指令冲突。 当存储指令保留在存储队列中时,处理器可以从加载队列中移除加载指令。 队列信息可以包括加载队列的换行值。

    Load/store dependency predictor optimization for replayed loads

    公开(公告)号:US10437595B1

    公开(公告)日:2019-10-08

    申请号:US15070435

    申请日:2016-03-15

    Applicant: Apple Inc.

    Abstract: Systems, apparatuses, and methods for optimizing a load-store dependency predictor (LSDP). When a younger load instruction is issued before an older store instruction and the younger load is dependent on the older store, the LSDP is trained on this ordering violation. A replay/flush indicator is stored in a corresponding entry in the LSDP to indicate whether the ordering violation resulted in a flush or replay. On subsequent executions, a dependency may be enforced for the load-store pair if a confidence counter is above a threshold, with the threshold varying based on the status of the replay/flush indicator. If a given load matches on multiple entries in the LSDP, and if at least one of the entries has a flush indicator, then the given load may be marked as a multimatch case and forced to wait to issue until all older stores have issued.

    ARCHITECTED STATE RETENTION
    8.
    发明申请

    公开(公告)号:US20180307297A1

    公开(公告)日:2018-10-25

    申请号:US15496290

    申请日:2017-04-25

    Applicant: Apple Inc.

    Abstract: Systems, apparatuses, and methods for retaining architected state for relatively frequent switching between sleep and active operating states are described. A processor receives an indication to transition from an active state to a sleep state. The processor stores a copy of a first subset of the architected state information in on-die storage elements capable of retaining storage after power is turned off. The processor supports programmable input/output (PIO) access of particular stored information during the sleep state. When a wakeup event is detected, circuitry within the processor is powered up again. A boot sequence and recovery of architected state from off-chip memory are not performed. Rather than fetch from a memory location pointed to by a reset base address register, the processor instead fetches an instruction from a memory location pointed to by a restored program counter of the retained subset of the architected state information.

    Access permissions modification
    9.
    发明授权

    公开(公告)号:US09852084B1

    公开(公告)日:2017-12-26

    申请号:US15017427

    申请日:2016-02-05

    Applicant: Apple Inc.

    CPC classification number: G06F12/1483 G06F12/1009 G06F2212/1052

    Abstract: Systems, apparatuses, and methods for modifying access permissions in a processor. A processor may include one or more permissions registers for managing access permissions. A first permissions register may be utilized to override access permissions embedded in the page table data. A plurality of bits from the page table data may be utilized as an index into the first permissions register for the current privilege level. An attribute field may be retrieved from the first permissions register to determine the access permissions for a given memory request. A second permissions register may also be utilized to set the upper and lower boundary of a region in physical memory where the kernel is allowed to execute. A lock register may prevent any changes from being made to the second permissions register after the second permissions register has been initially programmed.

    REDUCING LATENCY FOR POINTER CHASING LOADS
    10.
    发明申请
    REDUCING LATENCY FOR POINTER CHASING LOADS 有权
    减少点火负荷的延迟

    公开(公告)号:US20150309792A1

    公开(公告)日:2015-10-29

    申请号:US14264789

    申请日:2014-04-29

    Applicant: Apple Inc.

    CPC classification number: G06F9/30043 G06F9/3826 G06F9/3834 G06F9/3861

    Abstract: Systems, methods, and apparatuses for reducing the load to load/store address latency in an out-of-order processor. When a producer load is detected in the processor pipeline, the processor predicts whether the producer load is going to hit in the store queue. If the producer load is predicted not to hit in the store queue, then a dependent load or store can be issued early. The result data of the producer load is then bypassed forward from the data cache directly to the address generation unit. This result data is then used to generate an address for the dependent load or store, reducing the latency of the dependent load or store by one clock cycle.

    Abstract translation: 用于减少在乱序处理器中加载/存储地址延迟的负载的系统,方法和装置。 当在处理器流水线中检测到生产者负载时,处理器预测生产者负载是否要在存储队列中命中。 如果生产者负载被预测不会在商店队列中击中,则可以提前发出依赖负载或商店。 然后,生成器负载的结果数据从数据高速缓存直接旁路到地址生成单元。 然后,该结果数据用于生成相关负载或存储的地址,从而将依赖负载或存储的延迟减少一个时钟周期。

Patent Agency Ranking