Load ordering in a weakly-ordered processor
    1.
    发明授权
    Load ordering in a weakly-ordered processor 有权
    在弱有序处理器中加载排序

    公开(公告)号:US09383995B2

    公开(公告)日:2016-07-05

    申请号:US13750972

    申请日:2013-01-25

    Applicant: Apple Inc.

    CPC classification number: G06F9/30043 G06F9/3834

    Abstract: Techniques are disclosed relating to ordering of load instructions in a weakly-ordered memory model. In one embodiment, a processor includes a cache with multiple cache lines and a store queue configured to maintain status information associated with a store instruction that targets a location in one of the cache lines. In this embodiment, the processor is configured to set an indicator in the status information in response to migration of the targeted cache line. The indicator may be usable to sequence performance of load instructions that are younger than the store instruction. For example, the processor may be configured to wait, based on the indicator, to perform a younger load instruction that targets the same location as the store instruction until the store instruction is removed from the store queue. This may prevent forwarding of the value of the store instruction to the younger load and preserve load-load ordering.

    Abstract translation: 公开了关于弱有序存储器模型中的加载指令的排序的技术。 在一个实施例中,处理器包括具有多个高速缓存行的高速缓存和存储队列,该存储队列被配置为维护与存储指令相关联的状态信息,所述存储指令针对高速缓存行之一中的位置 在该实施例中,处理器被配置为响应于目标高速缓存线的迁移而将状态信息中的指示符设置成。 该指示符可用于对比小于存储指令的加载指令的性能进行排序。 例如,处理器可以被配置为基于指示符等待执行与存储指令相同的位置的较年轻的加载指令,直到存储指令从存储队列中移除。 这可能会阻止将存储指令的值转发到较小的负载并保持负载负载顺序。

    Execution unit power management
    2.
    发明授权

    公开(公告)号:US10037073B1

    公开(公告)日:2018-07-31

    申请号:US15273925

    申请日:2016-09-23

    Applicant: Apple Inc.

    CPC classification number: G06F1/3287 G06F1/3206 G06F1/3228 G06F1/3243

    Abstract: A processor includes an instruction issue circuit, and high-utilization and low-utilization execution unit circuits coupled to execute instructions received from the instruction issue unit. On average, utilization of the low-utilization execution unit circuit is lower than utilization of the high-utilization execution unit circuit. The processor also includes a retention circuit coupled to a different power domain than the low-utilization execution unit circuit, and a power management circuit. The power management circuit may be configured to detect that inactivity of the low-utilization execution unit circuit satisfies a threshold inactivity level; upon detecting that the threshold inactivity level is satisfied, cause architecturally-visible state of the low-utilization execution unit circuit to be copied to the retention circuit; and subsequent to copying of the architecturally-visible state to the retention circuit, cause the low-utilization execution unit circuit to enter a power-off state, where the retention circuit retains stored data during the power-off state.

    CONCURRENT STORE AND LOAD OPERATIONS
    4.
    发明申请
    CONCURRENT STORE AND LOAD OPERATIONS 有权
    当前存储和负载操作

    公开(公告)号:US20150199272A1

    公开(公告)日:2015-07-16

    申请号:US14154122

    申请日:2014-01-13

    Applicant: Apple Inc.

    CPC classification number: G06F12/0815 G06F12/0844 G06F12/0891

    Abstract: Systems, processors, and methods for efficiently handling concurrent store and load operations within a processor. A processor comprises a load-store unit (LSU) with a banked level-one (L1) data cache. When a store operation is ready to write data to the L1 data cache, the store operation will skip the write to any banks that have a conflict with a concurrent load operation. A partial write of the store operation will be performed to those banks of the L1 data cache that do not have a conflict with a concurrent load operation. For every attempt to write the store operation, a corresponding store mask will be updated to indicate which portions of the store operation were successfully written to the L1 data cache.

    Abstract translation: 用于在处理器内有效处理并发存储和加载操作的系统,处理器和方法。 处理器包括具有一级(L1)数据高速缓存的加载存储单元(LSU)。 当存储操作准备好将数据写入L1数据高速缓存时,存储操作将跳过对与并发加载操作冲突的任何存储区的写操作。 将对与数据并行加载操作不冲突的L1数据高速缓存区进行存储操作的部分写入。 对于每次尝试写存储操作时,将更新相应的存储掩码,以指示存储操作的哪些部分已成功写入L1数据高速缓存。

    Lookahead Scheme for Prioritized Reads
    5.
    发明申请
    Lookahead Scheme for Prioritized Reads 审中-公开
    优先阅读的前瞻方案

    公开(公告)号:US20150161033A1

    公开(公告)日:2015-06-11

    申请号:US14624621

    申请日:2015-02-18

    Applicant: Apple Inc.

    Abstract: A circular queue implementing a scheme for prioritized reads is disclosed. In one embodiment, a circular queue (or buffer) includes a number of storage locations each configured to store a data value. A multiplexer tree is coupled between the storage locations and a read port. A priority circuit is configured to generate and provide selection signals to each multiplexer of the multiplexer tree, based on a priority scheme. Based on the states of the selection signals, one of the storage locations is coupled to the read port via the multiplexers of the multiplexer tree.

    Abstract translation: 公开了实现优先读取方案的循环队列。 在一个实施例中,循环队列(或缓冲器)包括多个存储位置,每个存储位置被配置为存储数据值。 复用器树耦合在存储位置和读端口之间。 优先级电路被配置为基于优先级方案来生成并提供对多路复用器树的每个多路复用器的选择信号。 基于选择信号的状态,其中一个存储位置经由复用器树的多路复用器耦合到读端口。

    Completing load and store instructions in a weakly-ordered memory model
    6.
    发明授权
    Completing load and store instructions in a weakly-ordered memory model 有权
    在弱有序的内存模型中完成加载和存储指令

    公开(公告)号:US09535695B2

    公开(公告)日:2017-01-03

    申请号:US13750942

    申请日:2013-01-25

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to completion of load and store instructions in a weakly-ordered memory model. In one embodiment, a processor includes a load queue and a store queue and is configured to associate queue information with a load instruction in an instruction stream. In this embodiment, the queue information indicates a location of the load instruction in the load queue and one or more locations in the store queue that are associated with one or more store instructions that are older than the load instruction. The processor may determine, using the queue information, that the load instruction does not conflict with a store instruction in the store queue that is older than the load instruction. The processor may remove the load instruction from the load queue while the store instruction remains in the store queue. The queue information may include a wrap value for the load queue.

    Abstract translation: 公开了在弱有序存储器模型中完成负载和存储指令的技术。 在一个实施例中,处理器包括加载队列和存储队列,并且被配置为将队列信息与指令流中的加载指令相关联。 在该实施例中,队列信息指示加载队列中的加载指令的位置和存储队列中与一个或多个比加载指令更早的存储指令相关联的一个或多个位置。 处理器可以使用队列信息来确定加载指令不与存储队列中比加载指令更早的存储指令冲突。 当存储指令保留在存储队列中时,处理器可以从加载队列中移除加载指令。 队列信息可以包括加载队列的换行值。

    Power switch ramp rate control using selectable daisy-chained connection of enable to power switches or daisy-chained flops providing enables
    7.
    发明授权
    Power switch ramp rate control using selectable daisy-chained connection of enable to power switches or daisy-chained flops providing enables 有权
    电源开关斜坡率控制使用可选的菊花链连接启用电源开关或菊花链触发器提供启用

    公开(公告)号:US09564898B2

    公开(公告)日:2017-02-07

    申请号:US14622111

    申请日:2015-02-13

    Applicant: Apple Inc.

    CPC classification number: H03K19/00361 H03K19/0013 H03K19/0016

    Abstract: In an embodiment, an integrated circuit may include one or more power gated blocks and a power manager circuit. The power manager circuit may be configured to generate a block enable for each power gated block and a block enable clock. The power gated block may generate local block enables to various power switch segments in the power gated block. In particular, the power gated block may include a set of series-connected flops that receive the block enable from the power manager circuit. The power gated block may include a set of multiplexors (muxes) that provide the local block enables for each power switch segment. One input of the muxes is coupled to the block enable, and the other input is coupled to another enable propagated through one of the other power switch segments. Accordingly, the muxes may be controlled to select the propagated enables or the input block enable.

    Abstract translation: 在一个实施例中,集成电路可以包括一个或多个电源门控块和功率管理器电路。 功率管理器电路可以被配置为为每个电源门控块和块使能时钟生成块使能。 电源门控块可以在电源门控块中产生各种电源开关段的本地块使能。 特别地,电源门控块可以包括从电源管理器电路接收块使能的一组串联的触发器。 功率门控块可以包括为每个功率开关段提供本地块使能的一组多路复用器(多路复用器)。 多路复用器的一个输入耦合到块使能,另一个输入耦合到通过其它功率开关段之一传播的另一个功能。 因此,可以控制多路复用器来选择传播的使能或输入块使能。

    Register file circuit design process

    公开(公告)号:US09824171B2

    公开(公告)日:2017-11-21

    申请号:US14820223

    申请日:2015-08-06

    Applicant: Apple Inc.

    CPC classification number: G06F17/505 G06F17/5068

    Abstract: In some embodiments, a register file circuit design process includes instructing an automated integrated circuit design program to generate a register file circuit design, including providing a cell circuit design and instructing the automated integrated circuit design program to generate a selection design, a pre-decode design, and a data gating design. The cell circuit design describes a plurality of selection circuits that have a particular arrangement. The selection design describes a plurality of replica circuits that include respective pluralities of selection circuits having the particular arrangement. The pre-decode design describes a pre-decode circuit configured to identify a plurality of entries identified by a portion of a write instruction. The data gating design describes data gating circuits configured, in response to the pre-decode circuit not identifying respective entries, to disable data inputs to respective write selection circuits connected to the respective entries.

    REGISTER FILE CIRCUIT DESIGN PROCESS
    9.
    发明申请
    REGISTER FILE CIRCUIT DESIGN PROCESS 有权
    寄存器文件电路设计流程

    公开(公告)号:US20170039299A1

    公开(公告)日:2017-02-09

    申请号:US14820223

    申请日:2015-08-06

    Applicant: Apple Inc.

    CPC classification number: G06F17/505 G06F17/5068

    Abstract: In some embodiments, a register file circuit design process includes instructing an automated integrated circuit design program to generate a register file circuit design, including providing a cell circuit design and instructing the automated integrated circuit design program to generate a selection design, a pre-decode design, and a data gating design. The cell circuit design describes a plurality of selection circuits that have a particular arrangement. The selection design describes a plurality of replica circuits that include respective pluralities of selection circuits having the particular arrangement. The pre-decode design describes a pre-decode circuit configured to identify a plurality of entries identified by a portion of a write instruction. The data gating design describes data gating circuits configured, in response to the pre-decode circuit not identifying respective entries, to disable data inputs to respective write selection circuits connected to the respective entries.

    Abstract translation: 在一些实施例中,寄存器文件电路设计过程包括指示自动集成电路设计程序产生寄存器文件电路设计,包括提供单元电路设计并指示自动化集成电路设计程序产生选择设计,预解码 设计和数据门控设计。 单元电路设计描述了具有特定布置的多个选择电路。 选择设计描述了包括具有特定布置的相应多个选择电路的多个复制电路。 预解码设计描述了预解码电路,其被配置为识别由写指令的一部分识别的多个条目。 数据门控设计描述了数据选通电路,其响应于未识别相应条目的预解码电路而配置,以禁止连接到各个条目的相应写入选择电路的数据输入。

    Concurrent store and load operations
    10.
    发明授权
    Concurrent store and load operations 有权
    并行存储和加载操作

    公开(公告)号:US09448936B2

    公开(公告)日:2016-09-20

    申请号:US14154122

    申请日:2014-01-13

    Applicant: Apple Inc.

    CPC classification number: G06F12/0815 G06F12/0844 G06F12/0891

    Abstract: Systems, processors, and methods for efficiently handling concurrent store and load operations within a processor. A processor comprises a load-store unit (LSU) with a banked level-one (L1) data cache. When a store operation is ready to write data to the L1 data cache, the store operation will skip the write to any banks that have a conflict with a concurrent load operation. A partial write of the store operation will be performed to those banks of the L1 data cache that do not have a conflict with a concurrent load operation. For every attempt to write the store operation, a corresponding store mask will be updated to indicate which portions of the store operation were successfully written to the L1 data cache.

    Abstract translation: 用于在处理器内有效处理并发存储和加载操作的系统,处理器和方法。 处理器包括具有一级(L1)数据高速缓存的加载存储单元(LSU)。 当存储操作准备好将数据写入L1数据高速缓存时,存储操作将跳过对与并发加载操作冲突的任何存储区的写操作。 将对与数据并行加载操作不冲突的L1数据高速缓存区进行存储操作的部分写入。 对于每次尝试写存储操作时,将更新相应的存储掩码,以指示存储操作的哪些部分已成功写入L1数据高速缓存。

Patent Agency Ranking