PROVIDING LATE PHYSICAL REGISTER ALLOCATION AND EARLY PHYSICAL REGISTER RELEASE IN OUT-OF-ORDER PROCESSOR (OOP)-BASED DEVICES IMPLEMENTING A CHECKPOINT-BASED ARCHITECTURE

    公开(公告)号:WO2020061341A1

    公开(公告)日:2020-03-26

    申请号:PCT/US2019/051972

    申请日:2019-09-19

    Abstract: Providing late physical register allocation and early physical register release in out-of-order processor (OOP)-based devices implementing a checkpoint-based architecture is provided. In this regard, an OOP-based device provides a register management circuit that is configured to employ a combination of the checkpoint approach and the virtual register approach. The register management circuit includes a most recent table (MRT) for tracking mappings of logical register numbers (LRNs) to physical register numbers (PRNs), a physical register file (PRF) storing information for physical registers, a virtual register file (VRF) storing data for virtual registers, and a checkpoint queue for tracking active checkpoints (each of which is a snapshot of the MRT at a given time). The register management circuit applies checkpoint selection criteria for balancing the number of checkpoints, and implements late physical register allocation using virtual registers to provide an effectively larger physical register file and checkpoint-based early release of physical registers.

    METHOD AND APPARATUS FOR DYNAMIC CLOCK AND VOLTAGE SCALING IN A COMPUTER PROCESSOR BASED ON PROGRAM PHASE
    3.
    发明申请
    METHOD AND APPARATUS FOR DYNAMIC CLOCK AND VOLTAGE SCALING IN A COMPUTER PROCESSOR BASED ON PROGRAM PHASE 审中-公开
    基于程序相位的计算机处理器中的动态时钟和电压缩放的方法和装置

    公开(公告)号:WO2017119991A1

    公开(公告)日:2017-07-13

    申请号:PCT/US2016/066099

    申请日:2016-12-12

    Abstract: The disclosure generally relates to dynamic clock and voltage scaling (DCVS) based on program phase. For example, during each program phase, a first hardware counter may count each cycle where a dispatch stall occurs and an oldest instruction in a load queue is a last-level cache miss, a second hardware counter may count total cycles, and a third hardware counter may count committed instructions. Accordingly, a software/firmware mechanism may read the various hardware counters once the committed instruction counter reaches a threshold value and divide a value of first hardware counter by a value of second hardware counter to measure a stall fraction during a current program execution phase. The measured stall fraction can then be used to predict a stall fraction in a next program execution phase such that optimal voltage and frequency settings can be applied in the next phase based on the predicted stall fraction.

    Abstract translation: 本公开总体上涉及基于编程阶段的动态时钟和电压缩放(DCVS)。 例如,在每个程序阶段期间,第一硬件计数器可以计数发生调度停顿的每个周期,并且加载队列中的最老指令是最后一级高速缓存未命中,第二硬件计数器可以对总周期进行计数,并且第三硬件 计数器可以计数承诺的指示 因此,一旦所提交的指令计数器达到阈值并且将第一硬件计数器的值除以第二硬件计数器的值以在当前程序执行阶段期间测量停滞部分,则软件/固件机制可以读取各种硬件计数器。 然后可以使用测量的失速分数来预测下一个程序执行阶段中的失速分数,使得可以基于预测的失速分数在下一阶段应用最佳电压和频率设置。

    LINK STACK REPAIR OF ERRONEOUS SPECULATIVE UPDATE
    4.
    发明申请
    LINK STACK REPAIR OF ERRONEOUS SPECULATIVE UPDATE 审中-公开
    链路堆栈修复错误的参数更新

    公开(公告)号:WO2009046326A1

    公开(公告)日:2009-04-09

    申请号:PCT/US2008/078789

    申请日:2008-10-03

    CPC classification number: G06F9/3842 G06F9/3806 G06F9/3861

    Abstract: Whenever a link address is written to the link stack, the prior value of the link stack entry is saved, and is restored to the link stack after a link stack push operation is speculatively executed following a mispredicted branch. This condition is detected by maintaining a count of the total number of uncommitted link stack write instructions in the pipeline, and a count of the number of uncommitted link stack write instructions ahead of each branch instruction. When a branch is evaluated and determined to have been mispredicted, the count associated with it is compared to the total count. A discrepancy indicates a link stack write instruction was speculatively issued into the pipeline after the mispredicted branch instruction, and pushed a link address onto the link stack. The prior link address is restored to the link stack from the link stack restore buffer.

    Abstract translation: 每当链接地址被写入链接堆栈时,链接堆栈条目的先前值被保存,并且在错误预测的分支之后推测地执行链路堆叠推送操作之后被还原到链路栈。 通过维持流水线中未提交的链路堆栈写入指令的总数的计数以及每个分支指令之前的未提交的链路栈写入指令的数量的计数来检测该条件。 当分支被评估并确定为被误判时,将与之相关联的计数与总计数进行比较。 一个差异表示在错误预测的分支指令之后推测发出链路堆栈写入指令,并将链路地址推送到链路堆栈上。 链路堆栈恢复缓冲区中的链路栈恢复到先前的链路地址。

    SEGMENTED PIPELINE FLUSHING FOR MISPREDICTED BRANCHES
    5.
    发明申请
    SEGMENTED PIPELINE FLUSHING FOR MISPREDICTED BRANCHES 审中-公开
    用于错误分支的SEGMENTED管道冲洗

    公开(公告)号:WO2008092045A1

    公开(公告)日:2008-07-31

    申请号:PCT/US2008/051966

    申请日:2008-01-24

    CPC classification number: G06F9/384 G06F9/3842 G06F9/3863 G06F9/3867

    Abstract: A processor pipeline is segmented into an upper portion - prior to instructions going out of program order - and one or more lower portions beyond the upper portion. The upper pipeline is flushed upon detecting that a branch instruction was mispredicted, minimizing the delay in fetching of instructions from the correct branch target address. The lower pipelines may continue execution until the mispredicted branch instruction confirms, at which time all uncommitted instructions are flushed from the lower pipelines. Existing exception pipeline flushing mechanisms may be utilized, by adding a mispredicted branch identifier, reducing the complexity and hardware cost of flushing the lower pipelines.

    Abstract translation: 处理器管线被分割成上部,在指令超出程序顺序之前,以及超出上部的一个或多个下部。 在检测到分支指令被错误预测时,上级流水线被刷新,从而使得从正确的分支目标地址获取指令的延迟最小化。 较低的管道可以继续执行,直到错误的分支指令确认,此时所有未提交的指令都从较低的管道冲洗。 可以通过添加错误的分支标识符来减少冲洗下层管道的复杂性和硬件成本,来利用现有的异常流水线冲洗机制。

    PRE-DECODING VARIABLE LENGTH INSTRUCTIONS
    6.
    发明申请
    PRE-DECODING VARIABLE LENGTH INSTRUCTIONS 审中-公开
    预编译可变长度指令

    公开(公告)号:WO2007130798A1

    公开(公告)日:2007-11-15

    申请号:PCT/US2007/067057

    申请日:2007-04-20

    Abstract: A pre-decoder in a variable instruction length processor indicates properties of instructions in pre-decode bits stored in an instruction cache with the instructions. When all the encodings of pre-decode bits associate with one length instruction are defined, a property of an instruction of that length may be indicated by altering the instruction to emulate an instruction of a different length, and encoding the property in the pre-decode bits associated with instructions of the different length. One example of a property that may be so indicated is an undefined instruction.

    Abstract translation: 可变指令长度处理器中的预解码器指示存储在具有指令的指令高速缓存中的预解码位中的指令的特性。 当预解码位的所有编码与一个长度指令相关联时,可以通过改变指令来模拟不同长度的指令来指示该长度的指令的属性,并且在预解码中编码该属性 与不同长度的指令相关联的位。 可能如此指示的属性的一个示例是未定义的指令。

    HIERARCHICAL REGISTER FILE SYSTEM
    7.
    发明申请
    HIERARCHICAL REGISTER FILE SYSTEM 审中-公开
    分层寄存器文件系统

    公开(公告)号:WO2017040087A1

    公开(公告)日:2017-03-09

    申请号:PCT/US2016/048008

    申请日:2016-08-22

    CPC classification number: G06F9/30105 G06F9/30138 G06F9/384 G06F9/3867

    Abstract: Systems and methods relate to a hierarchical register file system including a level 1 physical register file (LI PRF) and a backing physical register file (PRF). A subset of ouptuts of instructions executed in an instruction pipeline of a processor which are deemed to have a high likelihood of use for one or more future instructions are identified. The subset of instruction outputs are stored in the LI PRF, while all instructon outputs are stored in the backing PRF.

    Abstract translation: 系统和方法涉及包括1级物理寄存器文件(LI PRF)和后置物理寄存器文件(PRF)的分级寄存器文件系统。 识别在处理器的指令流水线中执行的被认为对于一个或多个未来指令具有高可用性的指令的子集。 指令输出的子集存储在LI PRF中,而所有的指令输出都存储在后备PRF中。

    FREEING PHYSICAL REGISTERS IN A MICROPROCESSOR
    8.
    发明申请
    FREEING PHYSICAL REGISTERS IN A MICROPROCESSOR 审中-公开
    在微处理器中释放物理寄存器

    公开(公告)号:WO2015142435A1

    公开(公告)日:2015-09-24

    申请号:PCT/US2015/014541

    申请日:2015-02-05

    Abstract: Physical register scrubbing in computer microprocessors. Most instructions in a computer program produce some output value that is destined for one or more architected registers. These architected destination registers are renamed, in the processor pipeline, to physical registers in order to improve performance by exposing more instruction level parallelism to the processor. In one aspect, a method comprises identifying, in a reorder buffer, a first instruction and a second instruction, without intervening potential pipeline flushers, that write to the same architected destination register, in order to free the physical register corresponding to the older of the two instructions.

    Abstract translation: 计算机微处理器中的物理寄存器擦除。 计算机程序中的大多数指令产生一些输出值,用于一个或多个架构化寄存器。 这些架构化的目标寄存器在处理器流水线中被重命名为物理寄存器,以便通过向处理器暴露更多的指令级并行性来提高性能。 在一个方面,一种方法包括在重排序缓冲器中识别第一指令和第二指令,而不间断地写入到同一架构目的寄存器的潜在流水线冲洗器,以便释放对应于较早的 两个说明。

    EFFECTIVE USE OF A BHT IN PROCESSOR HAVING VARIABLE LENGTH INSTRUCTION SET EXECUTION MODES
    9.
    发明申请
    EFFECTIVE USE OF A BHT IN PROCESSOR HAVING VARIABLE LENGTH INSTRUCTION SET EXECUTION MODES 审中-公开
    BHT在具有可变长度指令集执行模式的处理器中的有效使用

    公开(公告)号:WO2008039975A1

    公开(公告)日:2008-04-03

    申请号:PCT/US2007/079864

    申请日:2007-09-28

    Abstract: In a processor executing instructions in at least a first instruction set execution mode having a first minimum instruction length and a second instruction set execution mode having a smaller, second minimum instruction length, line and counter index addresses are formed that access every counter in a branch history table (BHT), and reduce the number of index address bits that are multiplexed based on the current instruction set execution mode. In one embodiment, counters within a BHT line are arranged and indexed in such a manner that half of the BHT can be powered down for each access in one instruction set execution mode.

    Abstract translation: 在处理器执行至少具有第一最小指令长度的第一指令集执行模式和具有较小的第二最小指令长度的第二指令集执行模式的指令时,形成行和每个计数器索引地址,以访问分支中的每个计数器 历史表(BHT),并根据当前指令集执行模式减少多路复用的索引地址位的数量。 在一个实施例中,BHT线内的计数器被布置和索引,使得一个BHT的一半可以在一个指令集执行模式中为每个访问断电。

Patent Agency Ranking