摘要:
A system for performing limited out-of order execution of floating point loads. The system includes a plurality of stages making up a pipeline, the stages including an early stage. The system also includes a mechanism for inputting an arithmetic instruction into the pipeline, the arithmetic instruction including a result address. The mechanism also determines if the arithmetic instruction causes a write after write (WAW) condition to occur before writing a result of the arithmetic instruction to the result address. The determining includes comparing the result address to a load address associated with a load instruction subsequent to the arithmetic instruction in the pipeline. The load data associated with the load instruction was written to the load address in the early stage of the pipeline. A WAW condition occurs if the result address is equal to the load address. Writing a result of the arithmetic instruction is suppressed in response to the WAW condition occurring.
摘要:
A system for performing decimal floating point addition. The system includes input registers for inputting a first and second operand for an addition operation. The system also includes a plurality of adder blocks, each calculating a sum of one or more corresponding digits from the first operand and the second operand. Output from each of the adder blocks includes the sum of the corresponding digits and a carry out indicator for the corresponding digits. The calculating is performed during a first clock cycle. The system also includes an intermediate result register for storing the sums of the corresponding digits output from each of the plurality of adder blocks, the storing during the first clock cycle. The system further includes a carry chain for storing the carry out indicator output from each of the plurality of adder blocks, the storing occurring during the first clock cycle. The system further includes an incrementer for adding one to each of the sums stored in the intermediate result register, the incrementing occurring during a second clock cycle. In addition, a mechanism is provided for selecting between each of the sums and the sums incremented by one. The input to the mechanism includes the carry chain. The output includes the final sum of the first operand and the second operand. The selecting occurs during the second clock cycle.
摘要:
A method for leading zero detection. The method includes receiving DPD encoded data representing a three digit BCD number and determining directly from the DPD encoded data if the BCD number represented by the DPD encoded data contains at least one leading zero digit. A group one switch is set to zero if it was determined that the BCD number represented by the DPD encoded data contains at least one leading zero digit and set to one otherwise. The method also includes determining directly from the DPD encoded data if the BCD number represented by the DPD encoded data contains at least two leading zero digits. A group two switch is set to zero if it was determined that the BCD number represented by the DPD encoded data contains at least two leading zero digits and set to one otherwise. The method further includes determining directly from the DPD encoded data if the BCD number represented by the DPD encoded data contains three leading zero digits. A group three switch is set to zero if was determined that the BCD number represented by the DPD encoded data contains three leading zero digits and set to one otherwise.
摘要:
A system for performing floating point arithmetic operations including an input register adapted for receiving an operand. The system also includes a mechanism for performing a shift or masking operation in response to determining that the operand is in an un-normalized format. The system also includes instructions for performing single precision incrementing of the operand in response to determining that the operand is single precision, that the operand requires the incrementing based on the results of a previous operation and that the previous operation did not perform the incrementing. The operand was created in the previous operation. The system further includes instructions for performing double precision incrementing of the operand in response to determining that the operand is double precision, that the operand requires the incrementing based on the results of the previous operation and that the previous operation did not perform the incrementing.
摘要:
A system for performing floating point arithmetic operations including an input register adapted for receiving an operand. The system also includes computer instructions for performing single precision incrementing of the operand in response to determining that the operand is single precision, that the operand requires the incrementing based on the results of a previous operation and that the previous operation did not perform the incrementing. The operand was created in the previous operation. The system further includes instructions for performing double precision incrementing of the operand in response to determining that the operand is double precision, that the operand requires the incrementing based on the results of the previous operation and that the previous operation did not perform the incrementing.
摘要:
A method of processing data employs a new rounding mode called “round for reround” on the original arithmetic instruction in the hardware precision, and then 2) invoking an instruction which specifies a variable rounding precision and possibly explicitly sets the rounding mode which we have called the ReRound instruction. The precise result of the arithmetic operation is first truncated to the hardware format precision “p”, forming an intermediate result. If only zeros are dropped during truncation, then the intermediate result is equal to the precise result, and this result is said to be “exact”, otherwise, it is “inexact”. When the intermediate result is inexact and its least significant digit is either zero or five, then that digit is incremented to one or six respectively forming the rounded result. Thus, when the least significant digit of a rounded result is zero or five the result could be construed to be exact or exactly halfway between two machine representations if it were later rounded to one less digit of precision. For all other values, it is obvious that the result is inexact and not halfway between two machine representations for later roundings to fewer than “p” digits of precision. A nice mathematical property of this rounding mode is that results stay ordered and in a hardware implementation it is guaranteed that the incrementation of the least significant digit does not cause a carry into the next digit of the result.
摘要翻译:处理数据的方法采用在硬件精度上对原始算术指令称为“round for reround”的新的舍入模式,然后2)调用指定变量舍入精度的指令,并可能明确地设置我们称之为舍入模式 ReRound指令。 算术运算的精确结果首先被截断为硬件格式精度“p”,形成中间结果。 如果在截断期间仅删除零,则中间结果等于精确结果,并且该结果被称为“精确”,否则为“不精确”。 当中间结果不精确,其最低有效位为零或五时,则该数字分别增加到一个或六个,分别形成舍入结果。 因此,当舍入结果的最低有效数字为零或五时,如果结果被稍后舍入为一个较小的精度数字,则结果可以被解释为两个机器表示之间的精确或准确的中间。 对于所有其他值,很明显,结果是不精确的,而不是两个机器表示之间的中间,以便稍后的舍入少于“p”位精度。 这种舍入模式的一个很好的数学属性是结果保持有序,并且在硬件实现中,保证最低有效位的递增不会导致结果的下一个数字的进位。
摘要:
A novel system and method for working around a processing flaw in a processor is disclosed. At least one instruction is fetched from a memory location. The instruction is decoded. A set of opcode compare logic, associated with an instruction decode unit and/or a set of global completion table, is used for an opcode compare operation. The compare operation compares the instruction and a set of values within at least one opcode compare register in response to the decoding. The instruction is marked with a pattern based on the opcode compare operation. The pattern indicates that the instruction is associated with a processing flaw. The pattern is separate and distinct from opcode information within the instruction that is utilized by the set of opcode compare logic during the opcode compare operation.
摘要:
A system and method for issuing a processor instruction to multiple processing sections arranged in an out-of-order processing pipeline architecture. The multiple processing sections include a first execution unit with a pipeline length and a second execution unit operating upon data produced by the first execution unit. An instruction issue unit accepts a complex instruction that is cracked into respective micro-ops for the first execution unit and the second execution unit. The instruction issue unit issues the first micro-op to the first execution unit to produce intermediate data. The instruction issue unit then delays for a time period corresponding to the processing pipeline length of the first execution unit. After the delay, a second micro-op is issued to the second execution unit.
摘要:
A system for performing floating point arithmetic operations including a plurality of stages making up a pipeline, the stages including a first stage and a last stage. The system also includes a register file adapted for receiving a store instruction for input to the pipeline, where the data associated with the store instruction is dependent on a previous operation still in the pipeline. The system further includes a store register adapted for outputting the data associated with the store instruction to memory and a control unit having instructions. The instructions are directed to inputting the store instruction into the pipeline and to providing a path for forwarding the data associated with the store instruction from the last stage in the pipeline to the store register for use by the store instruction if the previous operation immediately precedes the store operation in the pipeline and if there is a data type match between the store instruction and the previous operation. In addition, the instructions are directed to inputting the store instruction into the pipeline and to providing a path for forwarding the data associated with the store instruction from the first stage in the pipeline to the store register for use by the store instruction if the previous operation precedes the store operation by one or more stage in the pipeline and if there is a data type match between the store instruction and the previous operation.
摘要:
A method for converting from binary to decimal. The method includes receiving a binary coded decimal (BCD) number made up of one or more sets of three digits. A running sum and a running carry are set to zero. The following steps are performed for each set of three digits in the BCD number in order from the set of three digits containing the three most significant digits of the BCD number to the set of three digits containing the three least significant digits of the BCD number. The steps include: creating six partial products based on the set of three digits, the running sum and the running carry; combining the six partial products into two partial products; and storing the two partial products in the running sum and the running carry. After the loop has been performed for each set of three digits in the BCD number, the running sum and the running carry are combined into a final binary result.