摘要:
A computer processor system having a floating point processor for instructions which are processed in a six cycle pipeline, in which prior to the first cycle of the pipeline an instruction text is fetched, and during the first cycle for the fetched particular instruction it is decoded and the base (B) and index (X) register values are read for use in address generation. Instructions of the RX-type are extended by placing the extension of the operation code beyond the first four bytes of the instruction format and to assign the operation codes in such a way that the machine may determine from the first 8 bits of the operation code alone, the exact format of the instruction. Instructions formats include the ESA/390 instructions SS, RR; RX; S; RRE; RI: and the new RXE instructions. where instructions of the RXE format have their R.sub.1, X.sub.2, B.sub.2, and D.sub.2 fields in the identical positions in said instruction register as in the RX format to enable the processor to determine from the first 8 bits of the operation code alone that an instruction being decoded is an RXE format instruction and the register indexed extensions of the RXE format instruction, after which it gates the correct information to said X-B-D adder. During the second cycle the address add of B+X+Displacement is performed and sent to the cache processor's, and during the third and fourth cycles the cache is respectively accessed and data is returned, and during a fifth cycle execution of the fetched instruction occurs with the result putaway in a sixth cycle.RXE instructions can be used for floating point processing and fixed point processing.
摘要:
A computer processor floating point processor six cycle pipeline system where instruction text is fetched prior to the first cycle and decoded during the first cycle for the fetched particular instruction and the base (B) and index (X) register values are read for use in address generation. RXE Instructions are of the RX-type but extended by placing the extension of the operation code beyond the first four bytes of the instruction format and to assign the operation codes in such a way that the machine may determine the exact format from the first 8 bits of the operation code alone. ESA/390 instructions SS, RR; RX; S; RRE; RI; and the new RXE instructions have a format which can be used for fixed point processing as well as floating point processing where instructions of the RXE format have their R1, X2, B2, and D2 fields in the identical positions in said instruction register as in the RX format to enable the processor to determine from the first 8 bits of the operation code alone that an instruction being decoded is an RXE format instruction and the register indexed extensions of the RXE format instruction, after which it gates the correct information to said X-B-D adder. During the second cycle the address add of B+X+Displacement is performed and sent to the cache processor's, and during the third and fourth cycles the cache is respectively accessed and data is returned, and during a fifth cycle execution of the fetched instruction occurs with the result putaway in a sixth cycle.
摘要翻译:计算机处理器浮点处理器六循环流水线系统,其中指令文本在第一周期之前获取并且在第一周期期间被解码用于所提取的特定指令,并且基准(B)和索引(X)寄存器值被读取用于地址 代。 RXE指令是RX型,但通过将操作码的扩展置于指令格式的前四个字节之外进行扩展,并以这样的方式分配操作码,使得机器可以从前8位确定确切的格式 的操作代码。 ESA / 390指令SS,RR; RX; S; RRE; RI; 并且新的RXE指令具有可用于固定点处理以及浮点处理的格式,其中RXE格式的指令在所述指令寄存器中的相同位置具有其R1,X2,B2和D2字段,如 RX格式,使处理器能够从操作代码的前8位确定正在解码的指令是RXE格式指令和RXE格式指令的寄存器索引扩展,之后它将正确信息锁定到所述XBD加法器 。 在第二周期期间,执行B + X +位移的地址添加并发送到高速缓存处理器,并且在第三和第四周期期间,分别访问高速缓存并返回数据,并且在第五周期期间执行所取出的指令 结果放在第六个周期。
摘要:
A method and system for processing instructions in a floating point unit for executing denormalized numbers in floating point pipeline via serializing uses an instruction unit and having a control unit and a pipelined data flow unit, a shifter and a rounding unit. The floating point unit has an external feedback path for providing intermediate result data from said rounding unit to an input of the pipelined data flow unit to reuse the pipeline for denormalization by passing intermediate results in the pipeline which have a denormalized condition computed after the exponent calculation of the shifting circuit directly from the rounding unit to the top of the dataflow in the pipeline via an external feedback path. The pipelined has two paths which are selected based on the presence of other instructions in the pipeline. If no other instructions are in the pipeline a first path is taken which uses the external feedback path from the rounding unit back into the top of the dataflow. When there are instructions in the pipeline a shifter unit performing normalization of the fraction indicates possible underflow of the exponent, and prepares to hold the exponent and the fraction in a floating point data flow register; and upon detection of exponent underflow during the rounder stage and detection of any other instructions in pipeline; then the control unit forces an interrupt for serialization, and cancels execution of this instruction and other instructions in pipeline.
摘要:
IEEE compliant floating point unit mechanism allows variability in the execution of floating point operations according to the IEEE 754 standard and allowing variability of the standard to co-exist in hardware or in the combination of hardware and millicode. The FPU has a detector of special conditions which dynamically detects an event that the hardware execution of an IEEE compliant Binary Floating Point instruction will require millicode emulation. The complete set of events which millicode may emulate are predetermined early in the design process of the hardware. An exception handling unit assist millicode emulation by trapping the result of an exceptional condition without invoking the trap handler. When an exceptional condition is detected during execution, the IEEE 754 standard requires two different actions under control of a mask bit. If the mask bit is on, the result is written into an FPR and the trap handler is invoked. Otherwise, a default value is written, a flag is set, and the program continues execution. This allows a variation to the IEEE 754 standard. Two different versions of the function of the Multiply-then-Substract instruction are implemented for two different IEEE 754 compliant architectures.
摘要:
A computer system supporting multiple floating point architectures. In an embodiment of the invention, a floating point unit (FPU) is optimized for hex format. The FPU uses a hex internal dataflow with a with an exponent and bias sufficient to support a binary floating point architecture. The FPU includes format conversion means, rounding means, sticky bit calculation means, and special number control means to execute binary floating point operations according to the IEEE 754 standard. An embodiment of the invention provides a system for executing floating point operations in either IBM S/390 hexadecimal format or IEEE 754 binary format.
摘要翻译:支持多种浮点架构的计算机系统。 在本发明的实施例中,针对十六进制格式优化了浮点单元(FPU)。 FPU使用十六进制内部数据流,其中指数和偏差足以支持二进制浮点架构。 FPU包括格式转换装置,舍入装置,粘点计算装置和专用号码控制装置,以根据IEEE 754标准执行二进制浮点运算。 本发明的实施例提供了一种用于以IBM S / 390十六进制格式或IEEE 754二进制格式执行浮点运算的系统。
摘要:
An adder which takes advantage of the early arriving bits of a time skewed operand to provide a result to an add or substract operation without additional latency. Possible partial results are calculated and then selectively combined according to the late arriving data as the late arriving data becomes available. In an embodiment of the present invention, a first operand is partitioned into groups according to the arrival time of the skewed data, and possible partial results for each group are calculated for the full range of partial inputs that affect it. In addition, the high order groups are calculated with and without a borrow (carry) which is propagated from a low order group. Once the delayed partial operands are known and the borrows (carrys) determined the partial results are gated through multiplexers according to the borrows and partial results, and thus the result is provided with a delay similar to the delay in arrival of the skewed operand.
摘要:
There is a unique partitioning problem in determining how to execute the floating point multiply instruction defined by IEEE 754 standard for the quad word format on a S/390 processor. Several manufacturers including IBM and HP define the binary quad word format to have a 113 bit significand. IBM S/390 hexadecimal long floating point format has a 56 bit significand and most S/390 floating point units only contain a long format multiplier. Quad word format multiplication must be executed as a series of several long precision multiplications and extended precision or long precision additions. The S/390 hexadecimal quad word format is easier to implement than binary format since it has a 112 bit significand and can easily be partitioned into two 56 bit parts. But a 113 bit significand would just exceed two partitions and require a third. For extended precision multiplies each partition is multiplied by each other, so if there are two partitions only four multiplies are required but for three partitions this increases to nine multiplies. Methods for partitioning are disclosed here.
摘要翻译:确定如何在S / 390处理器上执行由IEEE 754标准定义的四字格式的浮点乘法指令,存在独特的划分问题。 包括IBM和HP在内的几家制造商将二进制四字格式定义为具有113位有效位数。 IBM S / 390十六进制长浮点格式具有56位有效位数,大多数S / 390浮点单元仅包含长格式乘数。 四字格式乘法必须作为一系列长精度乘法和扩展精度或长精度加法执行。 S / 390十六进制四进制字格式比二进制格式更容易实现,因为它具有112位有效位数,并且可以轻松地分为两个56位的部分。 但是一个113位的有效位数只会超过两个分区,需要三分之一。 对于扩展精度乘法,每个分区彼此相乘,因此如果有两个分区只需要四个乘法,但是对于三个分区,这增加到九个乘法。 这里公开了划分方法。
摘要:
A system implementing a methodology for determining the exponent in parallel with determining the fractional shift during normalization according to partitioning the exponent into partial exponent groups according to the fractional shift data flow, determining all possible partial exponent values for each partial exponent group according to the fractional data flow, and providing the exponent by selectively combining possible partial exponents from each partial exponent group according to the fractional data flow. There is also provided a system implementing a methodology for generating the sticky bit during normalization. Sticky bit information is precalculated and multiplexed according to the fractional dataflow. In an embodiment of the invention, group sticky signals are calculated in tree form, each group sticky having a number of possible sticky bits corresponding to the shift increment amount of the multiplexing. The group sticky bits are further multiplexed according to subsequent shift amounts in the fractional dataflow to provide an output sticky bit at substantially the same time as when the final fractional shift amount is available, and thereby at substantially the same time as the normalized fraction.
摘要:
An IEEE 754 standard floating point multiply instruction for binary extended precision format can be executed with a quad word format on an S/390 process. The multiplication calculation multiplies each partition by each other. In the multiplication calculation process dataflow process of either operand is a denormalized number, they are normalized at a stage which creates an expanded exponent range of one more bit, and the calculation continues to a parallel path multiplexor stage, but if neither operand is denormalized then the exponent of the number is expended and the calculation splits into four parallel paths, wherein two operand's sign bits are processed in a sign calculation block stage, the operands' two 16 bit binary exponents are processed by an exponent conversion block stage, and a partition multiplicand significand block stage receives a 113 bit multiplicand significand input for a fourth path. In this calculation third and fourth paths converge with a calculation which provides partial products and intermediate sums and finally a final product as a calculation block stage output, and this output and the exponent from said second path and the sign bit from said first path merge to provide a product which is represented in hexadecimal internal format and is converted back to binary format in calculation block stage and rounded.
摘要:
Provides a program translation and execution method which stores target routines (for execution by a target processor) corresponding to incompatible instructions, interruptions and authorizations of an incompatible program written for execution on another computer system built to a computer architecture incompatible with the architecture of the target processor's computer system. The disclosed process allows the target processor to emulate incompatible acts expected in the operation of an incompatible program when the target processor itself is incapable of performing the emulated acts. Each of the instructions, interruptions and authorizations found in the incompatible programs has one or more corresponding target routines, any of which may need to be preprocessed before it can precisely emulate the execution results required by the incompatible architecture. Target routines (corresponding to the incompatible instruction instances in an incompatible program being emulated) are accessed, patched where necessary, and executed by a target processor to enable the target processor to precisely obtain the execution results of the emulated incompatible program. Before preprocessing, each target routine may not be able to provide identical execution results as required by the incompatible architecture, and the preprocessing may patch one or more of its target instructions to enable the target routine to perform the identical emulation execution of the corresponding incompatible instruction. The patching and other modifications to a target routine are done by one or more preprocessing instructions stored in the target routine.