PROCESSOR WITH SECOND JUMP EXECUTION UNIT FOR BRANCH MISPREDICTION
    11.
    发明申请
    PROCESSOR WITH SECOND JUMP EXECUTION UNIT FOR BRANCH MISPREDICTION 审中-公开
    具有分支机构错误预测的第二个执行单元的处理程序

    公开(公告)号:US20140195790A1

    公开(公告)日:2014-07-10

    申请号:US13994676

    申请日:2011-12-28

    Abstract: A secondary jump execution unit (JEU) is incorporated in a micro-processor to operate concurrently with a primary JEU, enabling the execution of simultaneous branch operations with possible detection of multiple branch mispredicts. When branch operations are executed on both JEUs in a same instruction cycle, mispredict processing for the secondary JEU is skidded into the primary JEU's dispatch pipeline such that the branch processing for the secondary JEU occurs after processing of the branch for the primary JEU and while the primary JEU is not processing a branch. Moreover, in cases when a nuke command is also received from a reorder buffer of the processor, the branch processing for the secondary JEU is further delayed to accommodate processing of the nuke on the primary JEU. Further embodiments support the promotion of the secondary JEU to have access to the mispredict mechanisms of the primary JEU in certain circumstances.

    Abstract translation: 次级跳转执行单元(JEU)并入微处理器以与主JEU同时操作,使得能够执行同时分支操作,并可能检测到多个分支错误预测。 当在同一个指令周期中对两个JEU执行分支操作时,辅助JEU的错误预测处理被划分到主JEU的调度流水线中,使得辅助JEU的分支处理在主JEU的分支处理之后发生,而 初级JEU不处理分支。 此外,在从处理器的重新排序缓冲器接收到nuke命令的情况下,进一步延迟用于辅助JEU的分支处理,以适应主JEU上的nuke的处理。 进一步的实施方案支持促进联合联合国次级方案在某些情况下获得主要联合执行机构的错误预测机制。

    Scheduler Implementing Dependency Matrix Having Restricted Entries
    12.
    发明申请
    Scheduler Implementing Dependency Matrix Having Restricted Entries 审中-公开
    调度器实现具有限制条目的依赖矩阵

    公开(公告)号:US20140181476A1

    公开(公告)日:2014-06-26

    申请号:US13723684

    申请日:2012-12-21

    CPC classification number: G06F9/3838

    Abstract: A scheduler implementing a dependency matrix having restricted entries is disclosed. A processing device of the disclosure includes a decode unit to decode an instruction and a scheduler communicably coupled to the decode unit. In one embodiment, the scheduler is configured to receive the decoded instruction, determine that the decoded instruction qualifies for allocation as a restricted reservation station (RS) entry type in a dependency matrix maintained by the scheduler, identify RS entries in the dependency matrix that are free for allocation, allocate one of the identified free RS entries with information of the decoded instruction in the dependency matrix, and update a row of the dependency matrix corresponding to the claimed RS entry with source dependency information of the decoded instruction.

    Abstract translation: 公开了实现具有限制条目的依赖矩阵的调度器。 本公开的处理装置包括:解码单元,用于对指令进行解码;以及可通信地耦合到解码单元的调度器。 在一个实施例中,调度器被配置为接收解码的指令,确定解码的指令限定为由调度器维护的依赖矩阵中的受限保留站(RS)条目类型的分配,识别依赖矩阵中的RS条目 将所识别的空闲RS条目中的一个分配给依赖矩阵中的解码指令的信息,并且通过解码指令的源依赖性信息更新与所要求的RS条目相对应的依赖矩阵的一行。

    METHOD, APPARATUS, AND SYSTEM FOR ENERGY EFFICIENCY AND ENERGY CONSERVATION INCLUDING DETECTING AND CONTROLLING CURRENT RAMPS IN PROCESSING CIRCUIT
    15.
    发明申请
    METHOD, APPARATUS, AND SYSTEM FOR ENERGY EFFICIENCY AND ENERGY CONSERVATION INCLUDING DETECTING AND CONTROLLING CURRENT RAMPS IN PROCESSING CIRCUIT 有权
    用于能源效率和能源保护的方法,装置和系统,包括检测和控制处理电路中的电流RAM

    公开(公告)号:US20120221871A1

    公开(公告)日:2012-08-30

    申请号:US13340511

    申请日:2011-12-29

    CPC classification number: G06F1/3243 Y02D10/152

    Abstract: Some implementations provide techniques and arrangements for adjusting a rate at which operations are performed by a processor based on a comparison of a first indication of power consumed by the processor as a result of performing a first set of operations and a second indication of power consumed by the processor as a result of performing a second set of operations. The rate at which operations are performed by the processor may be adjusted when the comparison indicates that a difference between the first indication of power consumed by the processor and the second indication of power consumed by the processor is greater than a threshold value.

    Abstract translation: 一些实施方案提供了用于调整由处理器执行操作的速率的技术和布置,其基于由执行第一组操作的结果和由处理器消耗的功率的第二指示来比较由处理器消耗的功率的第一指示 作为执行第二组操作的结果的处理器。 当比较指示处理器消耗的功率的第一指示与处理器消耗的功率的第二指示之间的差异大于阈值时,可以调整由处理器执行操作的速率。

    Method and apparatus for modulo scheduled loop execution in a processor architecture
    17.
    发明授权
    Method and apparatus for modulo scheduled loop execution in a processor architecture 有权
    在处理器架构中用于模数调度循环执行的方法和装置

    公开(公告)号:US07725696B1

    公开(公告)日:2010-05-25

    申请号:US11867127

    申请日:2007-10-04

    Abstract: A processor method and apparatus that allows for the overlapped execution of multiple iterations of a loop while allowing the compiler to include only a single copy of the loop body in the code while automatically managing which iterations are active. Since the prologue and epilogue are implicitly created and maintained within the hardware in the invention, a significant reduction in code size can be achieved compared to software-only modulo scheduling. Furthermore, loops with iteration counts less than the number of concurrent iterations present in the kernel are also automatically handled. This hardware enhanced scheme achieves the same performance as the fully-specified standard method. Furthermore, the hardware reduces the power requirement as the entire fetch unit can be deactivated for a portion of the loop's execution. The basic design of the invention involves including a plurality of buffers for storing loop instructions, each of which is associated with an instruction decoder and its respective functional unit, in the dispatch stage of a processor. Control logic is used to receive loop setup parameters and to control the selective issue of instructions from the buffers to the functional units.

    Abstract translation: 一种处理器方法和装置,其允许重复执行循环的多次迭代,同时允许编译器在代码中仅包括循环体的单个副本,同时自动管理哪些迭代是活动的。 由于在本发明的硬件内隐含地创建和维护序言和结尾语言,与仅软件模数调度相比,可以实现代码大小的显着降低。 此外,迭代计数小于内核中存在的并发迭代次数的循环也会自动处理。 该硬件增强方案实现与完全指定的标准方法相同的性能。 此外,硬件可以减少功率需求,因为整个提取单元可以在循环执行的一部分中停用。 本发明的基本设计涉及在处理器的调度阶段包括多个用于存储循环指令的缓冲器,每个循环指令与指令解码器及其各自的功能单元相关联。 控制逻辑用于接收循环设置参数并控制从缓冲器到功能单元的指令的选择性发布。

    Method and apparatus for modulo scheduled loop execution in a processor architecture
    18.
    发明授权
    Method and apparatus for modulo scheduled loop execution in a processor architecture 有权
    在处理器架构中用于模数调度循环执行的方法和装置

    公开(公告)号:US07302557B1

    公开(公告)日:2007-11-27

    申请号:US09728441

    申请日:2000-12-01

    Abstract: A processor method and apparatus that allows for the overlapped execution of multiple iterations of a loop while allowing the compiler to include only a single copy of the loop body in the code while automatically managing which iterations are active. Since the prologue and epilogue are implicitly created and maintained within the hardware in the invention, a significant reduction in code size can be achieved compared to software-only modulo scheduling. Furthermore, loops with iteration counts less than the number of concurrent iterations present in the kernel are also automatically handled. This hardware enhanced scheme achieves the same performance as the fully-specified standard method. Furthermore, the hardware reduces the power requirement as the entire fetch unit can be deactivated for a portion of the loop's execution. The basic design of the invention involves including a plurality of buffers for storing loop instructions, each of which is associated with an instruction decoder and its respective functional unit, in the dispatch stage of a processor. Control logic is used to receive loop setup parameters and to control the selective issue of instructions from the buffers to the functional units.

    Abstract translation: 一种处理器方法和装置,其允许重复执行循环的多次迭代,同时允许编译器在代码中仅包括循环体的单个副本,同时自动管理哪些迭代是活动的。 由于在本发明的硬件内隐含地创建和维护序言和结尾语言,与仅软件模数调度相比,可以实现代码大小的显着降低。 此外,迭代计数小于内核中存在的并发迭代次数的循环也会自动处理。 该硬件增强方案实现与完全指定的标准方法相同的性能。 此外,硬件可以减少功率需求,因为整个提取单元可以在循环执行的一部分中停用。 本发明的基本设计涉及在处理器的调度阶段包括多个用于存储循环指令的缓冲器,每个循环指令与指令解码器及其各自的功能单元相关联。 控制逻辑用于接收循环设置参数并控制从缓冲器到功能单元的指令的选择性发布。

    Minimizing bandwidth to track return targets by an instruction tracing system
    19.
    发明授权
    Minimizing bandwidth to track return targets by an instruction tracing system 有权
    最小化带宽以通过指令跟踪系统跟踪返回目标

    公开(公告)号:US09442729B2

    公开(公告)日:2016-09-13

    申请号:US13890654

    申请日:2013-05-09

    Abstract: A processing device implementing minimizing bandwidth to track return targets by an instruction tracing system is disclosed. A processing device of the disclosure an instruction fetch unit comprising a return stack buffer (RSB) to predict a target address of a return (RET) instruction corresponding to a call (CALL) instruction. The processing device further includes a retirement unit comprising an instruction tracing module to initiate instruction tracing for instructions executed by the processing device, determine whether the target address of the RET instruction was mispredicted, determine a value of call depth counter (CDC) maintained by the instruction tracing module, and when the target address of the RET instruction was not mispredicted and when the value of the CDC is greater than zero, generate an indication that the RET instruction branches to a next linear instruction after the corresponding CALL instruction.

    Abstract translation: 公开了一种通过指令跟踪系统实现最小化带宽以跟踪返回目标的处理设备。 本公开的处理装置包括一个指令提取单元,该单元包括用于预测与一个调用(CALL)指令相对应的返回(RET)指令的目标地址的返回栈缓冲器(RSB)。 所述处理装置还包括退出单元,所述退出单元包括指令跟踪模块,用于启动由所述处理设备执行的指令的指令跟踪,确定所述RET指令的目标地址是否被错误预测,确定由所述处理设备维护的所述呼叫深度计数器 指令跟踪模块,并且当RET指令的目标地址未被错误预测时,并且当CDC的值大于零时,生成指令在相应的CALL指令之后分支到下一个线性指令。

    Systems and methods for flag tracking in move elimination operations
    20.
    发明授权
    Systems and methods for flag tracking in move elimination operations 有权
    移动消除操作中标志跟踪的系统和方法

    公开(公告)号:US09292288B2

    公开(公告)日:2016-03-22

    申请号:US13861009

    申请日:2013-04-11

    Abstract: Systems and methods for flag tracking in data manipulation operations involving move elimination. An example processing system comprises a first data structure including a plurality of physical register values; a second data structure including a plurality of pointers referencing elements of the first data structure; a third data structure including a plurality of move elimination sets, each move elimination set comprising two or more bits representing two or more logical data registers, the third data structure further comprising at least one bit associated with each move elimination set, the at least one bit representing one or more logical flag registers; a fourth data structure including an identifier of a data register sharing an element of the first data structure with a flag register; and a move elimination logic configured to perform a move elimination operation.

    Abstract translation: 涉及移动消除的数据处理操作中标志跟踪的系统和方法。 示例性处理系统包括包括多个物理寄存器值的第一数据结构; 第二数据结构,包括引用第一数据结构的元素的多个指针; 包括多个移动消除集合的第三数据结构,每个移动消除集合包括表示两个或多个逻辑数据寄存器的两个或多个位,所述第三数据结构还包括与每个移动消除集相关联的至少一个位,所述至少一个 位表示一个或多个逻辑标志寄存器; 第四数据结构,包括与标志寄存器共享第一数据结构的元素的数据寄存器的标识符; 以及配置为执行移动消除操作的移动消除逻辑。

Patent Agency Ranking