Method and apparatus for prefetching branch history information
    1.
    发明授权
    Method and apparatus for prefetching branch history information 失效
    用于预取分支历史信息的方法和装置

    公开(公告)号:US07493480B2

    公开(公告)日:2009-02-17

    申请号:US10197714

    申请日:2002-07-18

    IPC分类号: G06F9/00

    CPC分类号: G06F9/3806

    摘要: A two level branch history table (TLBHT) is substantially improved by providing a mechanism to prefetch entries from the very large second level branch history table (L2 BHT) into the active (very fast) first level branch history table (L1 BHT) before the processor uses them in the branch prediction process and at the same time prefetch cache misses into the instruction cache. The mechanism prefetches entries from the very large L2 BHT into the very fast L1 BHT before the processor uses them in the branch prediction process. A TLBHT is successful because it can prefetch branch entries into the L1 BHT sufficiently ahead of the time the entry is needed. This feature of the TLBHT is also used to prefetch instructions into the cache ahead of their use. In fact, the timeliness of the prefetches produced by the TLBHT can be used to remove most of the cycle time penalty incurred by cache misses.

    摘要翻译: 通过提供一种将超大型第二级分支历史表(L2 BHT)中的条目预取到活动(非常快)的第一级分支历史表(L1 BHT)中的条目之前,两级分支历史表(TLBHT)被大大改善 处理器在分支预测过程中使用它们,并且同时将高速缓存未命中预取到指令高速缓存中。 在处理器在分支预测过程中使用它们之前,该机制将从非常大的L2 BHT中将条目预取到非常快的L1 BHT中。 TLBHT是成功的,因为它可以在需要输入的时间之前将分支条目预取到L1 BHT中。 TLBHT的这个功能也用于在使用之前将指令预取到高速缓存中。 实际上,由TLBHT产生的预取的及时性可以用来消除高速缓存未命中引起的大部分周期时间损失。

    IEEE compliant floating point unit
    2.
    发明授权
    IEEE compliant floating point unit 失效
    符合IEEE标准的浮点单元

    公开(公告)号:US6044454A

    公开(公告)日:2000-03-28

    申请号:US26328

    申请日:1998-02-19

    摘要: IEEE compliant floating point unit mechanism allows variability in the execution of floating point operations according to the IEEE 754 standard and allowing variability of the standard to co-exist in hardware or in the combination of hardware and millicode. The FPU has a detector of special conditions which dynamically detects an event that the hardware execution of an IEEE compliant Binary Floating Point instruction will require millicode emulation. The complete set of events which millicode may emulate are predetermined early in the design process of the hardware. An exception handling unit assist millicode emulation by trapping the result of an exceptional condition without invoking the trap handler. When an exceptional condition is detected during execution, the IEEE 754 standard requires two different actions under control of a mask bit. If the mask bit is on, the result is written into an FPR and the trap handler is invoked. Otherwise, a default value is written, a flag is set, and the program continues execution. This allows a variation to the IEEE 754 standard. Two different versions of the function of the Multiply-then-Substract instruction are implemented for two different IEEE 754 compliant architectures.

    摘要翻译: 符合IEEE标准的浮点单元机制允许根据IEEE 754标准执行浮点运算的可变性,并允许标准的可变性在硬件或硬件和毫代数的组合中共存。 FPU具有特殊条件检测器,可动态检测符合IEEE标准的二进制浮点指令的硬件执行需要进行微码仿真的事件。 在硬件的设计过程的早期,预先确定了一系列可能模拟的事件。 异常处理单元通过捕获特殊条件的结果而不调用陷阱处理程序来辅助millicode仿真。 当在执行期间检测到异常情况时,IEEE 754标准在屏蔽位的控制下需要两个不同的动作。 如果掩码位打开,则将结果写入FPR,并调用陷阱处理程序。 否则,将写入默认值,设置一个标志,程序继续执行。 这允许对IEEE 754标准的变化。 对于两种不同的符合IEEE 754标准的架构,实现了两种不同版本的“乘法 - 再次抽取”指令的功能。

    Carry select and input select adder for late arriving data

    公开(公告)号:US5654911A

    公开(公告)日:1997-08-05

    申请号:US472962

    申请日:1995-06-07

    IPC分类号: G06F7/50 G06F7/507

    CPC分类号: G06F7/507

    摘要: An adder which takes advantage of the early arriving bits of a time skewed operand to provide a result to an add or substract operation without additional latency. Possible partial results are calculated and then selectively combined according to the late arriving data as the late arriving data becomes available. In an embodiment of the present invention, a first operand is partitioned into groups according to the arrival time of the skewed data, and possible partial results for each group are calculated for the full range of partial inputs that affect it. In addition, the high order groups are calculated with and without a borrow (carry) which is propagated from a low order group. Once the delayed partial operands are known and the borrows (carrys) determined the partial results are gated through multiplexers according to the borrows and partial results, and thus the result is provided with a delay similar to the delay in arrival of the skewed operand.

    Partitioning of binary quad word format multiply instruction on S/390
processor
    4.
    发明授权
    Partitioning of binary quad word format multiply instruction on S/390 processor 失效
    在S / 390处理器上分配二进制四字格式乘法指令

    公开(公告)号:US6021422A

    公开(公告)日:2000-02-01

    申请号:US33626

    申请日:1998-03-05

    申请人: Eric Mark Schwarz

    发明人: Eric Mark Schwarz

    IPC分类号: G06F7/52 G06F7/44 G06F7/38

    摘要: There is a unique partitioning problem in determining how to execute the floating point multiply instruction defined by IEEE 754 standard for the quad word format on a S/390 processor. Several manufacturers including IBM and HP define the binary quad word format to have a 113 bit significand. IBM S/390 hexadecimal long floating point format has a 56 bit significand and most S/390 floating point units only contain a long format multiplier. Quad word format multiplication must be executed as a series of several long precision multiplications and extended precision or long precision additions. The S/390 hexadecimal quad word format is easier to implement than binary format since it has a 112 bit significand and can easily be partitioned into two 56 bit parts. But a 113 bit significand would just exceed two partitions and require a third. For extended precision multiplies each partition is multiplied by each other, so if there are two partitions only four multiplies are required but for three partitions this increases to nine multiplies. Methods for partitioning are disclosed here.

    摘要翻译: 确定如何在S / 390处理器上执行由IEEE 754标准定义的四字格式的浮点乘法指令,存在独特的划分问题。 包括IBM和HP在内的几家制造商将二进制四字格式定义为具有113位有效位数。 IBM S / 390十六进制长浮点格式具有56位有效位数,大多数S / 390浮点单元仅包含长格式乘数。 四字格式乘法必须作为一系列长精度乘法和扩展精度或长精度加法执行。 S / 390十六进制四进制字格式比二进制格式更容易实现,因为它具有112位有效位数,并且可以轻松地分为两个56位的部分。 但是一个113位的有效位数只会超过两个分区,需要三分之一。 对于扩展精度乘法,每个分区彼此相乘,因此如果有两个分区只需要四个乘法,但是对于三个分区,这增加到九个乘法。 这里公开了划分方法。

    Parallel calculation of exponent and sticky bit during normalization
    5.
    发明授权
    Parallel calculation of exponent and sticky bit during normalization 失效
    在归一化期间并行计算指数和粘点

    公开(公告)号:US5757682A

    公开(公告)日:1998-05-26

    申请号:US414072

    申请日:1995-03-31

    摘要: A system implementing a methodology for determining the exponent in parallel with determining the fractional shift during normalization according to partitioning the exponent into partial exponent groups according to the fractional shift data flow, determining all possible partial exponent values for each partial exponent group according to the fractional data flow, and providing the exponent by selectively combining possible partial exponents from each partial exponent group according to the fractional data flow. There is also provided a system implementing a methodology for generating the sticky bit during normalization. Sticky bit information is precalculated and multiplexed according to the fractional dataflow. In an embodiment of the invention, group sticky signals are calculated in tree form, each group sticky having a number of possible sticky bits corresponding to the shift increment amount of the multiplexing. The group sticky bits are further multiplexed according to subsequent shift amounts in the fractional dataflow to provide an output sticky bit at substantially the same time as when the final fractional shift amount is available, and thereby at substantially the same time as the normalized fraction.

    摘要翻译: 根据分数位移数据流,根据将指数分解成部分指数组,实现用于在归一化期间确定分数移位的方法来确定指数的方法的系统,根据分数确定每个部分指数组的所有可能的部分指数值 数据流,并且通过根据分数据流选择性地组合来自每个部分指数组的可能部分指数来提供指数。 还提供了一种实现在归一化过程中产生粘性位的方法的系统。 粘滞位信息根据分数据流进行预先计算和复用。 在本发明的一个实施例中,以树形式计算组粘性信号,每组粘性具有与多路复用的移位增量量相对应的多个可能的粘性位。 组粘性位根据分数据流中的随后的移位量进一步复用,以在与最终分数移位量可用时基本相同的时间提供输出粘性位,并且因此与归一化分数基本上相同。

    Address bit decoding for same adder circuitry for RXE instruction format
with same XBD location as RX format and dis-jointed extended operation
code
    6.
    发明授权
    Address bit decoding for same adder circuitry for RXE instruction format with same XBD location as RX format and dis-jointed extended operation code 失效
    地址比特解码用于RXE指令格式的相同加法器电路,具有与RX格式相同的XBD位置和解码的扩展操作码

    公开(公告)号:US6105126A

    公开(公告)日:2000-08-15

    申请号:US70359

    申请日:1998-04-30

    CPC分类号: G06F9/355 G06F9/30185

    摘要: A computer processor floating point processor six cycle pipeline system where instruction text is fetched prior to the first cycle and decoded during the first cycle for the fetched particular instruction and the base (B) and index (X) register values are read for use in address generation. RXE Instructions are of the RX-type but extended by placing the extension of the operation code beyond the first four bytes of the instruction format and to assign the operation codes in such a way that the machine may determine the exact format from the first 8 bits of the operation code alone. ESA/390 instructions SS, RR; RX; S; RRE; RI; and the new RXE instructions have a format which can be used for fixed point processing as well as floating point processing where instructions of the RXE format have their R1, X2, B2, and D2 fields in the identical positions in said instruction register as in the RX format to enable the processor to determine from the first 8 bits of the operation code alone that an instruction being decoded is an RXE format instruction and the register indexed extensions of the RXE format instruction, after which it gates the correct information to said X-B-D adder. During the second cycle the address add of B+X+Displacement is performed and sent to the cache processor's, and during the third and fourth cycles the cache is respectively accessed and data is returned, and during a fifth cycle execution of the fetched instruction occurs with the result putaway in a sixth cycle.

    摘要翻译: 计算机处理器浮点处理器六循环流水线系统,其中指令文本在第一周期之前获取并且在第一周期期间被解码用于所提取的特定指令,并且基准(B)和索引(X)寄存器值被读取用于地址 代。 RXE指令是RX型,但通过将操作码的扩展置于指令格式的前四个字节之外进行扩展,并以这样的方式分配操作码,使得机器可以从前8位确定确切的格式 的操作代码。 ESA / 390指令SS,RR; RX; S; RRE; RI; 并且新的RXE指令具有可用于固定点处理以及浮点处理的格式,其中RXE格式的指令在所述指令寄存器中的相同位置具有其R1,X2,B2和D2字段,如 RX格式,使处理器能够从操作代码的前8位确定正在解码的指令是RXE格式指令和RXE格式指令的寄存器索引扩展,之后它将正确信息锁定到所述XBD加法器 。 在第二周期期间,执行B + X +位移的地址添加并发送到高速缓存处理器,并且在第三和第四周期期间,分别访问高速缓存并返回数据,并且在第五周期期间执行所取出的指令 结果放在第六个周期。

    Floating point binary quad word format multiply instruction unit
    7.
    发明授权
    Floating point binary quad word format multiply instruction unit 失效
    浮点二进制四字格式乘法指令单元

    公开(公告)号:US6055554A

    公开(公告)日:2000-04-25

    申请号:US34718

    申请日:1998-03-04

    申请人: Eric Mark Schwarz

    发明人: Eric Mark Schwarz

    CPC分类号: G06F7/4876 G06F7/5324

    摘要: An IEEE 754 standard floating point multiply instruction for binary extended precision format can be executed with a quad word format on an S/390 process. The multiplication calculation multiplies each partition by each other. In the multiplication calculation process dataflow process of either operand is a denormalized number, they are normalized at a stage which creates an expanded exponent range of one more bit, and the calculation continues to a parallel path multiplexor stage, but if neither operand is denormalized then the exponent of the number is expended and the calculation splits into four parallel paths, wherein two operand's sign bits are processed in a sign calculation block stage, the operands' two 16 bit binary exponents are processed by an exponent conversion block stage, and a partition multiplicand significand block stage receives a 113 bit multiplicand significand input for a fourth path. In this calculation third and fourth paths converge with a calculation which provides partial products and intermediate sums and finally a final product as a calculation block stage output, and this output and the exponent from said second path and the sign bit from said first path merge to provide a product which is represented in hexadecimal internal format and is converted back to binary format in calculation block stage and rounded.

    摘要翻译: 用于二进制扩展精度格式的IEEE 754标准浮点乘法指令可以在S / 390进程上以四字格式执行。 乘法计算将每个分区彼此相乘。 在乘法计算过程中,任一操作数的数据流处理是非规范化数,它们在创建一个多位的扩展指数范围的阶段进行归一化,并且计算继续到并行路径多路复用器阶段,但是如果两个操作数都不是非规范化的 数字的指数被消耗,并且计算分成四个并行路径,其中在符号计算块级中处理两个操作数的符号位,操作数的两个16位二进制指数由指数转换块级处理,并且分区 被乘数有效位块接收第四路径的113位被乘数有效位数输入。 在该计算中,第三和第四路径与提供部分乘积和中间和的计算收敛,最终将最终乘积作为计算块级输出收敛,并且来自所述第二路径的输出和来自所述第一路径的符号位的输出合并为 提供以十六进制内部格式表示的产品,并在计算块阶段转换回二进制格式并舍入。

    Preprocessing of stored target routines for emulating incompatible
instructions on a target processor
    8.
    发明授权
    Preprocessing of stored target routines for emulating incompatible instructions on a target processor 失效
    用于在目标处理器上模拟不兼容指令的存储目标程序的预处理

    公开(公告)号:US6009261A

    公开(公告)日:1999-12-28

    申请号:US991714

    申请日:1997-12-16

    IPC分类号: G06F9/455

    CPC分类号: G06F9/45504

    摘要: Provides a program translation and execution method which stores target routines (for execution by a target processor) corresponding to incompatible instructions, interruptions and authorizations of an incompatible program written for execution on another computer system built to a computer architecture incompatible with the architecture of the target processor's computer system. The disclosed process allows the target processor to emulate incompatible acts expected in the operation of an incompatible program when the target processor itself is incapable of performing the emulated acts. Each of the instructions, interruptions and authorizations found in the incompatible programs has one or more corresponding target routines, any of which may need to be preprocessed before it can precisely emulate the execution results required by the incompatible architecture. Target routines (corresponding to the incompatible instruction instances in an incompatible program being emulated) are accessed, patched where necessary, and executed by a target processor to enable the target processor to precisely obtain the execution results of the emulated incompatible program. Before preprocessing, each target routine may not be able to provide identical execution results as required by the incompatible architecture, and the preprocessing may patch one or more of its target instructions to enable the target routine to perform the identical emulation execution of the corresponding incompatible instruction. The patching and other modifications to a target routine are done by one or more preprocessing instructions stored in the target routine.

    摘要翻译: 提供程序转换和执行方法,其存储对应于不兼容的指令,不兼容程序的中断和授权的对象程序(用于由目标处理器执行),该程序被编写用于在另一个计算机系统上执行以执行,该计算机系统与目标架构不兼容 处理器的计算机系统。 当目标处理器本身不能执行仿真动作时,所公开的过程允许目标处理器模拟在不兼容的程序的操作中期望的不兼容的动作。 在不兼容程序中发现的每个指令,中断和授权都有一个或多个相应的目标程序,其中任何一个可能需要进行预处理,才能精确地模拟不兼容架构所需的执行结果。 目标程序(对应于正在仿真的不兼容程序中的不兼容指令实例)被访问,必要时进行修补,并由目标处理器执行,以使目标处理器能够精确获取仿真不兼容程序的执行结果。 在预处理之前,每个目标程序可能无法提供与不兼容体系结构相同的执行结果,并且预处理可能会修补其目标指令中的一个或多个,以使目标程序执行相应不兼容指令的相同仿真执行 。 对目标程序的修补和其他修改由存储在目标程序中的一个或多个预处理指令完成。

    Method and system of rounding for division or square root: eliminating
remainder calculation
    9.
    发明授权
    Method and system of rounding for division or square root: eliminating remainder calculation 失效
    舍入或平方根的舍入方法和系统:消除余数计算

    公开(公告)号:US5764555A

    公开(公告)日:1998-06-09

    申请号:US614561

    申请日:1996-03-13

    摘要: A method and system which provides exactly rounded division and square root results for a designated rounding mode independently of a remainder, or equivalent calculation of the relationship between the remainder and zero, for predetermined combinations of the rounding mode and the guard digit of an estimate that has several more bits of precision than the exactly rounded result, and has an error tolerance magnitude less than the weight of the least significant bit of the estimate. The estimate is generated in accordance with a quadratically converging division or square root algorithm. The method and system is described in connection with IEEE 754-1985 and IBM S/390 binary floating point architectures.

    摘要翻译: 一种独立于余数的指定舍入模式提供精确的舍入除法和平方根结果的方法和系统,或为舍入模式和估计的保护数字的预定组合的余数和零之间的关系的等效计算提供的方法和系统, 具有比精确舍入结果多几位精度,并且具有小于估计的最低有效位的权重的误差容差幅度。 根据二次收敛除法或平方根算法生成估计值。 该方法和系统是结合IEEE 754-1985和IBM S / 390二进制浮点架构进行描述的。

    Method and system of rounding for quadratically converging division or
square root
    10.
    发明授权
    Method and system of rounding for quadratically converging division or square root 失效
    用于二次收敛除法或平方根的舍入方法和系统

    公开(公告)号:US5729481A

    公开(公告)日:1998-03-17

    申请号:US414867

    申请日:1995-03-31

    申请人: Eric Mark Schwarz

    发明人: Eric Mark Schwarz

    摘要: A method and system which provides exactly rounded division and square root results for a designated rounding mode independently of a remainder, or equivalent calculation of the relationship between the remainder and zero, for predetermined combinations of the rounding mode and the least significant bit of an estimate that has one more bit of precision than the exactly rounded result, and has an error tolerance magnitude less than the weight of the least significant bit of the estimate. The estimate is generated in accordance with a quadratically converging division or square root algorithm. The method and system is described in connection with IEEE 754-1985 and IBM S/390 binary floating point architectures.

    摘要翻译: 一种方法和系统,其针对指定的舍入模式提供精确的舍入除法和平方根结果,独立于余数,或者对于舍入模式的预定组合和估计的最低有效位之间的余数和零之间的关系的等效计算, 其具有比精确舍入结果多一位精度,并且具有小于估计的最低有效位的权重的误差容限幅度。 根据二次收敛除法或平方根算法生成估计值。 该方法和系统是结合IEEE 754-1985和IBM S / 390二进制浮点架构进行描述的。