DELAYING CACHE DATA ARRAY UPDATES
    11.
    发明申请
    DELAYING CACHE DATA ARRAY UPDATES 有权
    延迟缓存数据阵列更新

    公开(公告)号:US20150149722A1

    公开(公告)日:2015-05-28

    申请号:US14089014

    申请日:2013-11-25

    Applicant: Apple Inc.

    CPC classification number: G06F12/0811 G06F12/0842 G06F12/0857 G06F12/0888

    Abstract: Systems, methods, and apparatuses for reducing writes to the data array of a cache. A cache hierarchy includes one or more L1 caches and a L2 cache inclusive of the L2 cache(s). When a request from the L1 cache misses in the L2 cache, the L2 cache sends a fill request to memory. When the fill data returns from memory, the L2 cache delays writing the fill data to its data array. Instead, this cache line is written to the L1 cache and a clean-evict bit corresponding to the cache line is set in the L1 cache. When the L1 cache evicts this cache line, the L1 cache will write back the cache line to the L2 cache even if the cache line has not been modified.

    Abstract translation: 用于减少对缓存的数据阵列的写入的系统,方法和装置。 高速缓存层级包括一个或多个L1高速缓存和包括L2高速缓存的L2高速缓存。 当来自L1缓存的请求在L2高速缓存中丢失时,L2缓存向存储器发送填充请求。 当填充数据从存储器返回时,L2缓存延迟将填充数据写入其数据阵列。 相反,该缓存行被写入到L1高速缓存中,并且在高速缓存中设置与高速缓存行相对应的清除位。 当L1高速缓存驱逐此高速缓存行时,即使高速缓存行未被修改,L1高速缓存也将高速缓存行写回到L2高速缓存。

    PREFETCHING ACROSS PAGE BOUNDARIES IN HIERARCHICALLY CACHED PROCESSORS
    12.
    发明申请
    PREFETCHING ACROSS PAGE BOUNDARIES IN HIERARCHICALLY CACHED PROCESSORS 有权
    在高性能缓存处理器中的跨页面边界的前缀

    公开(公告)号:US20140149632A1

    公开(公告)日:2014-05-29

    申请号:US13689696

    申请日:2012-11-29

    Applicant: APPLE INC.

    Abstract: Processors and methods for preventing lower level prefetch units from stalling at page boundaries. An upper level prefetch unit closest to the processor core issues a preemptive request for a translation of the next page in a given prefetch stream. The upper level prefetch unit sends the translation to the lower level prefetch units prior to the lower level prefetch units reaching the end of the current page for the given prefetch stream. When the lower level prefetch units reach the boundary of the current page, instead of stopping, these prefetch units can continue to prefetch by jumping to the next physical page number provided in the translation.

    Abstract translation: 用于防止较低级别的预取单元在页面边界停止的处理器和方法。 最靠近处理器核心的高级预取单元在给定的预取流中发出对下一页的翻译的抢占请求。 在较低级预取单元到达给定预取流的当前页面的末尾之前,高级预取单元将转换发送到较低级预取单元。 当低级预取单元到达当前页面的边界而不是停止时,这些预取单元可以通过跳转到翻译中提供的下一个物理页码继续预取。

    Coprocessors with Bypass Optimization, Variable Grid Architecture, and Fused Vector Operations

    公开(公告)号:US20220358082A1

    公开(公告)日:2022-11-10

    申请号:US17869620

    申请日:2022-07-20

    Applicant: Apple Inc.

    Abstract: In an embodiment, a coprocessor may include a bypass indication which identifies execution circuitry that is not used by a given processor instruction, and thus may be bypassed. The corresponding circuitry may be disabled during execution, preventing evaluation when the output of the circuitry will not be used for the instruction. In another embodiment, the coprocessor may implement a grid of processing elements in rows and columns, where a given coprocessor instruction may specify an operation that causes up to all of the processing elements to operate on vectors of input operands to produce results. Implementations of the coprocessor may implement a portion of the processing elements. The coprocessor control circuitry may be designed to operate with the full grid or partial grid, reissuing instructions in the partial grid case to perform the requested operation. In still another embodiment, the coprocessor may be able to fuse vector mode operations.

    Unified address translation
    16.
    发明授权

    公开(公告)号:US11221962B2

    公开(公告)日:2022-01-11

    申请号:US16874997

    申请日:2020-05-15

    Applicant: Apple Inc.

    Abstract: A system and method for efficiently transferring address mappings and data access permissions corresponding to the address mappings. A computing system includes at least one processor and memory for storing a page table. In response to receiving a memory access operation comprising a first address, the address translation unit is configured to identify a data access permission based on a permission index corresponding to the first address, and access data stored in a memory location of the memory identified by a second address in a manner defined by the retrieved data access permission. The address translation unit is configured to access a table to identify the data access permission, and is configured to determine the permission index and the second address based on the first address. A single permission index may correspond to different permissions for different entities within the system.

    Architected state retention for a frequent operating state switching processor

    公开(公告)号:US10990159B2

    公开(公告)日:2021-04-27

    申请号:US15496290

    申请日:2017-04-25

    Applicant: Apple Inc.

    Abstract: Systems, apparatuses, and methods for retaining architected state for relatively frequent switching between sleep and active operating states are described. A processor receives an indication to transition from an active state to a sleep state. The processor stores a copy of a first subset of the architected state information in on-die storage elements capable of retaining storage after power is turned off. The processor supports programmable input/output (PIO) access of particular stored information during the sleep state. When a wakeup event is detected, circuitry within the processor is powered up again. A boot sequence and recovery of architected state from off-chip memory are not performed. Rather than fetch from a memory location pointed to by a reset base address register, the processor instead fetches an instruction from a memory location pointed to by a restored program counter of the retained subset of the architected state information.

    Unified prefetch circuit for multi-level caches

    公开(公告)号:US10180905B1

    公开(公告)日:2019-01-15

    申请号:US15093213

    申请日:2016-04-07

    Applicant: Apple Inc.

    Abstract: In an embodiment, a processor may implement an access map-pattern match (AMPM)-based prefetch circuit for a multi-level cache system. The access patterns that are matched to the access maps may include prefetches for different cache levels. Centralizing the generation of prefetches into one prefetch circuit may provide better observability and controllability of prefetching at various levels of the cache hierarchy, in an embodiment. Prefetches at different levels may be controlled individually based on the accuracy of those prefetches, in an embodiment. Additionally, in an embodiment, access patterns that are longer that a given threshold may have the granularity of the prefetches change so that more data is prefetched and the prefetches occur farther in advance, in some embodiments.

    Execution unit power management
    19.
    发明授权

    公开(公告)号:US10037073B1

    公开(公告)日:2018-07-31

    申请号:US15273925

    申请日:2016-09-23

    Applicant: Apple Inc.

    CPC classification number: G06F1/3287 G06F1/3206 G06F1/3228 G06F1/3243

    Abstract: A processor includes an instruction issue circuit, and high-utilization and low-utilization execution unit circuits coupled to execute instructions received from the instruction issue unit. On average, utilization of the low-utilization execution unit circuit is lower than utilization of the high-utilization execution unit circuit. The processor also includes a retention circuit coupled to a different power domain than the low-utilization execution unit circuit, and a power management circuit. The power management circuit may be configured to detect that inactivity of the low-utilization execution unit circuit satisfies a threshold inactivity level; upon detecting that the threshold inactivity level is satisfied, cause architecturally-visible state of the low-utilization execution unit circuit to be copied to the retention circuit; and subsequent to copying of the architecturally-visible state to the retention circuit, cause the low-utilization execution unit circuit to enter a power-off state, where the retention circuit retains stored data during the power-off state.

    Coprocessors with bypass optimization, variable grid architecture, and fused vector operations

    公开(公告)号:US12174785B2

    公开(公告)日:2024-12-24

    申请号:US17869620

    申请日:2022-07-20

    Applicant: Apple Inc.

    Abstract: In an embodiment, a coprocessor may include a bypass indication which identifies execution circuitry that is not used by a given processor instruction, and thus may be bypassed. The corresponding circuitry may be disabled during execution, preventing evaluation when the output of the circuitry will not be used for the instruction. In another embodiment, the coprocessor may implement a grid of processing elements in rows and columns, where a given coprocessor instruction may specify an operation that causes up to all of the processing elements to operate on vectors of input operands to produce results. Implementations of the coprocessor may implement a portion of the processing elements. The coprocessor control circuitry may be designed to operate with the full grid or partial grid, reissuing instructions in the partial grid case to perform the requested operation. In still another embodiment, the coprocessor may be able to fuse vector mode operations.

Patent Agency Ranking