Superscalar microprocessor having multi-pipe dispatch and execution unit
    1.
    发明授权
    Superscalar microprocessor having multi-pipe dispatch and execution unit 失效
    超标量微处理器具有多管调度和执行单元

    公开(公告)号:US07082517B2

    公开(公告)日:2006-07-25

    申请号:US10435806

    申请日:2003-05-12

    IPC分类号: G06F9/30 G06F15/00

    摘要: In a computer system for use as a symetrical multiprocessor, a superscalar microprocessor apparatus allows dispatching and executing multi-cycle and complex instructions Some control signals are generated in the dispatch unit and dispatched with the instruction to the Fixed Point Unit (FXU). Multiple execution pipes correspond to the instruction dispatch ports and the execution unit is a Fixed Point Unit (FXU) which contains three execution dataflow pipes (X, Y and Z) and one control pipe (R). The FXU logic then execute these instructions on the available FXU pipes. This results in optimum performance with little or no other complications. The presented technique places the flexibility of how these instructions will be executed in the FXU, where the actual execution takes place, instead of in the instruction decode or dispatch units or cracking by the compiler.

    摘要翻译: 在用作对称多处理器的计算机系统中,超标量微处理器装置允许调度和执行多周期和复杂指令。在调度单元中生成一些控制信号,并且通过指令发送到定点单元(FXU)。 多个执行管道对应于指令调度端口,执行单元是包含三个执行数据流管道(X,Y和Z)和一个控制管道(R)的定点单元(FXU)。 然后,FXU逻辑在可用的FXU管道上执行这些说明。 这导致最佳性能,很少或没有其他并发症。 所提出的技术使得如何在实际执行的FXU中执行这些指令的灵活性,而不是在指令解码或调度单元中或由编译器破解。

    Last iteration loop branch prediction upon counter threshold and resolution upon counter one
    2.
    发明授权
    Last iteration loop branch prediction upon counter threshold and resolution upon counter one 失效
    最后迭代循环分支预测在计数器阈值和分辨率之间

    公开(公告)号:US07010676B2

    公开(公告)日:2006-03-07

    申请号:US10436296

    申请日:2003-05-12

    IPC分类号: G06F9/38

    CPC分类号: G06F9/325 G06F9/3844

    摘要: An embodiment of the invention is a processor for processing loop branch instructions. The processor includes an instruction unit for fetching and decoding instructions including at least one loop branch instruction. A branch prediction unit predicts target instructions to be fetched and decoded by the instruction unit in response to the loop branch instruction. An execution unit executes instructions from the instruction unit and maintains a counter indicating an iteration of a loop. The execution unit includes detection logic for detecting when the counter equals a threshold and notifies the branch prediction unit when the counter equals the threshold.

    摘要翻译: 本发明的实施例是用于处理回路分支指令的处理器。 所述处理器包括用于对包括至少一个环路分支指令的指令进行读取和解码的指令单元。 分支预测单元响应于循环分支指令预测由指令单元获取和解码的目标指令。 执行单元从指令单元执行指令,并且维护指示循环迭代的计数器。 执行单元包括用于检测计数器何时等于阈值的检测逻辑,并且当计数器等于阈值时通知分支预测单元。

    METHOD, SYSTEM, COMPUTER PROGRAM PRODUCT, AND HARDWARE PRODUCT FOR IMPLEMENTING RESULT FORWARDING BETWEEN DIFFERENTLY SIZED OPERANDS IN A SUPERSCALAR PROCESSOR
    3.
    发明申请
    METHOD, SYSTEM, COMPUTER PROGRAM PRODUCT, AND HARDWARE PRODUCT FOR IMPLEMENTING RESULT FORWARDING BETWEEN DIFFERENTLY SIZED OPERANDS IN A SUPERSCALAR PROCESSOR 失效
    方法,系统,计算机程序产品和用于在超级处理器中执行不同尺寸操作之前的结果的硬件产品

    公开(公告)号:US20090240922A1

    公开(公告)日:2009-09-24

    申请号:US12051792

    申请日:2008-03-19

    IPC分类号: G06F9/30

    摘要: Result and operand forwarding is provided between differently sized operands in a superscalar processor by grouping a first set of instructions for operand forwarding, and grouping a second set of instructions for result forwarding, the first set of instructions comprising a first source instruction having a first operand and a first dependent instruction having a second operand, the first dependent instruction depending from the first source instruction; the second set of instructions comprising a second source instruction having a third operand and a second dependent instruction having a fourth operand, the second dependent instruction depending from the second source instruction, performing operand forwarding by forwarding the first operand, either whole or in part, as it is being read to the first dependent instruction prior to execution; performing result forwarding by forwarding a result of the second source instruction, either whole or in part, to the second dependent instruction, after execution; wherein the operand forwarding is performed by executing the first source instruction together with the first dependent instruction; and wherein the result forwarding is performed by executing the second source instruction together with the second dependent instruction.

    摘要翻译: 通过对用于操作数转发的第一组指令进行分组,以及对用于结果转发的第二组指令进行分组,在超标量处理器中的不同大小的操作数之间提供结果和操作数转发,所述第一组指令包括具有第一操作数的第一源指令 以及具有第二操作数的第一依赖指令,所述第一依赖指令取决于所述第一源指令; 所述第二组指令包括具有第三操作数和第二从属指令的第二源指令,所述第三操作数和第二从属指令具有第四操作数,所述第二依赖指令取决于所述第二源指令,通过转发所述第一操作数全部或部分地执行操作数转发, 因为它在执行之前被读取到第一个依赖指令; 执行结果转发,将第二源指令的结果全部或部分转发到第二依赖指令; 其中通过与第一依赖指令一起执行第一源指令来执行操作数转发; 并且其中通过与第二从属指令一起执行第二源指令来执行结果转发。

    Multi-pipe dispatch and execution of complex instructions in a superscalar processor
    4.
    发明授权
    Multi-pipe dispatch and execution of complex instructions in a superscalar processor 有权
    超标量处理器中的多管调度和复杂指令的执行

    公开(公告)号:US07085917B2

    公开(公告)日:2006-08-01

    申请号:US10435983

    申请日:2003-05-12

    IPC分类号: G06F9/30 G06F15/00

    摘要: In a computer system, a method and apparatus for dispatching and executing multi-cycle and complex instructions. The method results in maximum performance for such without impacting other areas in the processor such as decode, grouping or dispatch units. This invention allows multi-cycle and complex instructions to be dispatched to one port but executed in multiple execution pipes without cracking the instruction and without limiting it to a single execution pipe. Some control signals are generated in the dispatch unit and dispatched with the instruction to the Fixed Point Unit (FXU). The FXU logic then execute these instructions on the available FXU pipes. This method results in optimum performance with little or no other complications. The presented technique places the flexibility of how these instructions will be executed in the FXU, where the actual execution takes place, instead of in the instruction decode or dispatch units or cracking by the compiler.

    摘要翻译: 在计算机系统中,用于调度和执行多周期和复杂指令的方法和装置。 该方法导致最大的性能,而不会影响处理器中的其他区域,如解码,分组或调度单元。 本发明允许将多周期和复杂指令分派到一个端口,但是在多个执行管道中执行,而不会破坏指令,而不限于单个执行管道。 在调度单元中生成一些控制信号,并通过指令发送到定点单元(FXU)。 然后,FXU逻辑在可用的FXU管道上执行这些说明。 这种方法导致最佳性能,很少或没有其他并发症。 所提出的技术使得如何在实际执行的FXU中执行这些指令的灵活性,而不是在指令解码或调度单元中或由编译器破解。

    Operand and result forwarding between differently sized operands in a superscalar processor
    5.
    发明授权
    Operand and result forwarding between differently sized operands in a superscalar processor 失效
    操作数和结果在超标量处理器中的不同大小的操作数之间转发

    公开(公告)号:US07921279B2

    公开(公告)日:2011-04-05

    申请号:US12051792

    申请日:2008-03-19

    IPC分类号: G06F9/30

    摘要: Result and operand forwarding is provided between differently sized operands in a superscalar processor by grouping a first set of instructions for operand forwarding, and grouping a second set of instructions for result forwarding, the first set of instructions comprising a first source instruction having a first operand and a first dependent instruction having a second operand, the first dependent instruction depending from the first source instruction; the second set of instructions comprising a second source instruction having a third operand and a second dependent instruction having a fourth operand, the second dependent instruction depending from the second source instruction, performing operand forwarding by forwarding the first operand, either whole or in part, as it is being read to the first dependent instruction prior to execution; performing result forwarding by forwarding a result of the second source instruction, either whole or in part, to the second dependent instruction, after execution; wherein the operand forwarding is performed by executing the first source instruction together with the first dependent instruction; and wherein the result forwarding is performed by executing the second source instruction together with the second dependent instruction.

    摘要翻译: 通过对用于操作数转发的第一组指令进行分组,以及对用于结果转发的第二组指令进行分组,在超标量处理器中的不同大小的操作数之间提供结果和操作数转发,所述第一组指令包括具有第一操作数的第一源指令 以及具有第二操作数的第一依赖指令,所述第一依赖指令取决于所述第一源指令; 所述第二组指令包括具有第三操作数和第二从属指令的第二源指令,所述第三操作数和第二从属指令具有第四操作数,所述第二依赖指令取决于所述第二源指令,通过转发所述第一操作数全部或部分地执行操作数转发, 因为它在执行之前被读取到第一个依赖指令; 执行结果转发,将第二源指令的结果全部或部分转发到第二依赖指令; 其中通过与第一依赖指令一起执行第一源指令来执行操作数转发; 并且其中通过与第二从属指令一起执行第二源指令来执行结果转发。

    Cache set replacement order based on temporal set recording
    7.
    发明授权
    Cache set replacement order based on temporal set recording 有权
    基于时间设置记录的缓存集替换顺序

    公开(公告)号:US08806139B2

    公开(公告)日:2014-08-12

    申请号:US13354894

    申请日:2012-01-20

    IPC分类号: G06F12/12

    CPC分类号: G06F12/0875 G06F12/126

    摘要: A technique is provided for cache management of a cache. The processing circuit determines a miss count and a hit position field during a previous execution of an instruction requesting that a data element be stored in a cache. The miss count and the hit position field are stored for a data element corresponding to an instruction that requests storage of the data element. The processing circuit places the data element in a hierarchical order based on the miss count and/or the hit position field. The hit position field includes a hierarchical position related to the data element in the cache.

    摘要翻译: 提供了用于高速缓存的高速缓存管理的技术。 处理电路在先前执行请求数据元素存储在高速缓存中的指令期间确定未命中和命中位置字段。 针对与请求存储数据元素的指令相对应的数据元素存储未命中和命中位置字段。 处理电路基于错过次数和/或命中位置字段将数据元素放置成分层次序。 命中位置字段包括与缓存中的数据元素相关的分层位置。

    Modular binary multiplier for signed and unsigned operands of variable widths
    9.
    发明授权
    Modular binary multiplier for signed and unsigned operands of variable widths 有权
    具有可变宽度的有符号和无符号操作数的模块二进制乘法器

    公开(公告)号:US07490121B2

    公开(公告)日:2009-02-10

    申请号:US11749239

    申请日:2007-05-16

    IPC分类号: G06F7/52

    摘要: A method of implementing binary multiplication in a processing device includes obtaining a multiplicand and a multiplier from a storage device; in the event the multiplier is larger than a selected length, partitioning the multiplier into a plurality of multiplier subgroups; in the event the multiplicand is larger than a selected length, partitioning the multiplicand into a plurality of multiplicand subgroups and at least one of zeroing out of unused bits of the multiplicand subgroup and sign-extending a smaller portion of the multiplicand subgroup; establishing a plurality of multiplicand multiples based on at least one of a selected multiplicand subgroup of the plurality of multiplicand subgroups and the multiplicand; selecting one or more of the multiplicand multiples of the plurality of multiplicand multiples based on the each multiplier subgroup of the plurality of multiplier subgroups; and generating a first modular product based on the selected multiplicand multiples.

    摘要翻译: 在处理设备中实现二进制乘法的方法包括从存储设备获取乘法器和乘法器; 在乘数大于选定长度的情况下,将乘法器分成多个乘法器子组; 在所述被乘数大于所选择的长度的情况下,将所述被乘数划分为多个被乘数的子组和被乘数子组的未使用的比特中的至少一个,并对被乘数子组的较小部分进行符号扩展; 基于所述多个被乘数子组和被乘数中的所选择的被乘数子群中的至少一个,建立多个被乘数; 基于所述多个乘法器子组中的每个乘法器子组来选择所述多个被乘数中的一个或多个被乘数; 以及基于所选择的被乘数生成第一模块化产品。

    Fixed point unit pipeline allowing partial instruction execution during the instruction dispatch cycle
    10.
    发明授权
    Fixed point unit pipeline allowing partial instruction execution during the instruction dispatch cycle 失效
    固定点单元管道,允许在指令分配周期中执行部分指令

    公开(公告)号:US06944753B2

    公开(公告)日:2005-09-13

    申请号:US09832544

    申请日:2001-04-11

    IPC分类号: G06F9/308 G06F7/00

    CPC分类号: G06F7/764 G06F9/30018

    摘要: A method for allowing a partial instruction to be executed in a fixed point unit pipeline during the instruction dispatch cycle creates a mask used to select which bits of the operands participate in a future logical operation of the fixed point unit back a cycle to the instruction dispatch stage of the fixed point unit. As an S/390 System improvement applicable to other computers, the mask is determined and created two cycles ahead of execution, or two cycles before the mask is actually used. Also, in the method used for moving the mask generation back by one cycle, mask generation overlaps the dispatch stage in the I-unit, and this provides a handshake between the I-unit and E-unit of the fixed point unit of the central processor unit of the computer system. The control setting selection process occurs in a predetermination cycle stage or e-1 (em1) stage for the mask generation and the register file read address. Speculative handshaking allows the E-1 stage to be created with no impact to the last stage of the I-Unit, such that no additional logic is needed and cycle time is not jeopardized. Also, the E-1 stage of an instruction overlaps with the execution stages of previous instructions.

    摘要翻译: 在指令分派周期期间允许在固定点单元流水线中执行部分指令的方法产生用于选择操作数的哪些比特参与定点单元的未来逻辑运算的掩码,以循环到指令分派 固定点单元的阶段。 作为适用于其他计算机的S / 390系统改进,掩码将在执行前两个周期确定并创建,或者在实际使用掩码之前的两个周期。 另外,在用于将掩模生成移回一个周期的方法中,掩模生成与I单元中的调度级重叠,这提供了中心的固定点单元的I单元和E单元之间的握手 计算机系统的处理器单元。 控制设置选择处理在掩模生成和寄存器文件读取地址的预定周期阶段或e-1(em1)阶段中发生。 投机握手允许创建E-1阶段,对I单元的最后阶段没有影响,因此不需要额外的逻辑和循环时间不会受到损害。 此外,指令的E-1级与先前指令的执行阶段重叠。