Abstract:
Systems and methods are directed to accelerating operations associated with a microprocessor. Example embodiments improve the operations of the microprocessor by providing devices (e.g., integrated circuits, independent accelerators) configured to use reciprocal or reciprocal square root instructions. Such devices can be further configured to follow the reciprocal or reciprocal square root instructions with multiplication or other instructions to finish division, square root, or other complex operations.
Abstract:
A method for enhancing an accuracy of a sum of a plurality of floating-point numbers. The method receives a floating-point number and generates a plurality of provisional numbers with a value of zero. The method further generates a surjective map from the values of an exponent and a sign of a mantissa to the provisional numbers in the plurality of provisional numbers. The method further maps a value of the exponent and the sign of the mantissa to a first provisional number with the surjective map. The method further generates a test number from the first provisional number and if the test number exceeds a limit, modifies a second provisional number by using at least part of the test number. The method further equates the first provisional number to the test number if the test number does not exceed the limit. The method further sums the plurality of provisional numbers.
Abstract:
An instruction to perform a comparison of a first value and a second value is executed. Based on a control of the instruction, a compare function to be performed is determined. The compare function is one of a plurality of compare functions configured for the instruction, and the compare function has a plurality of options for comparison. A compare option based on the first value and the second value is selected from the plurality of options defined for the compare function, and used to compare the first value and the second value. A result of the comparison is then placed in a select location, the result to be used in processing within a computing environment.
Abstract:
A method for enhancing an accuracy of a sum of a plurality of floating-point numbers. The method receives a floating-point number and generates a plurality of provisional numbers with a value of zero. The method further generates a surjective map from the values of an exponent and a sign of a mantissa to the provisional numbers in the plurality of provisional numbers. The method further maps a value of the exponent and the sign of the mantissa to a first provisional number with the surjective map. The method further generates a test number from the first provisional number and if the test number exceeds a limit, modifies a second provisional number by using at least part of the test number. The method further equates the first provisional number to the test number if the test number does not exceed the limit. The method further sums the plurality of provisional numbers.
Abstract:
A process for propagating an error in a floating-point calculation is disclosed. A floating-point error occurring from the floating-point arithmetic calculation is trapped, and a special value is generated. Information regarding the error is stored as a payload of the special value. Program operations are resumed with the special value applied to further calculations dependent on the floating-point arithmetic calculation.
Abstract:
A set of instructions for implementation in a floating-point unit or other computer processor hardware is disclosed herein. In one embodiment, an extended-range fused multiply-add operation, a first look-up operation, and a second look-up operation are each embodied in hardware instructions configured to be operably executed in a processor. These operations are accompanied by a table which provides a set of defined values in response to various function types, supporting the computation of elementary functions such as reciprocal, square, cube, fourth roots and their reciprocals, exponential, and logarithmic functions. By allowing each of these functions to be computed with a hardware instruction, branching and predicated execution may be reduced or eliminated, while also permitting the use of distributed instructions across a number of execution units.
Abstract:
A system for providing a floating point sum includes an analyzer circuit configured to determine a first status of a first floating point operand and a second status of a second floating point operand based upon data within the first floating point operand and data with the second floating point operand respectively. In addition, the system includes a results circuit coupled to the analyzer circuit. The results circuit is configured to assert a resulting floating point operand containing the sum of the first floating point operand and the second floating point operand and a resulting status embedded within the resulting floating point operand.
Abstract:
Computing an output interval includes producing a first result from a conditional selection using a first operand, a second operand, and a third operand, the operands respectively including a second input interval upper-point, a first input interval upper-point, and a first input interval lower-point. Next, computing an output interval includes producing a second result from the conditional selection, the operands respectively including a second input interval upper-point, the first input interval upper-point, and the first input interval lower-point. Furthermore, computing an output interval includes producing a third result from a conditional division using the first operand, the second operand, and the third operand, the operands respectively including the first result, the second input interval upper-point, and the second input interval lower-point. And finally, a fourth result is produced from the conditional division, the operands respectively including the second result, the second input interval lower-point, and the second input interval upper-point.
Abstract:
A floating point max/min circuit for determining the maximum or minimum of two floating point operands includes a first analysis circuit configured to determine a format of a first floating point operand of the two floating point operands based upon floating point status information encoded within the first floating point operand, a second analysis circuit configured to determine a format of a second floating point operand of the two floating point operands based upon floating point status information encoded within the second floating point operand, a decision circuit, coupled to the first analysis circuit and to the second analysis circuit and responding to a function control signal that indicates the threshold condition is one of a maximum of the two floating point operands and a minimum of the two floating point operands, for generating at least one assembly control signal based on the format of a first floating point operand, the format of a second floating point operand, and the function control signal, and a result assembler circuit, coupled to the decision circuit, for producing a result indicating which of the first floating point operand and the second floating point operand meet the threshold condition, based on the at least one assembly control signal. The format of the floating point operands may be from a group comprising: not-a-number (NaN), positive infinity, negative infinity, normalized, denormalized, positive overflow, negative overflow, positive underflow, negative underflow, inexact, exact, division by zero, invalid operation, positive zero, and negative zero. The result produced may be a third floating point operand having encoded floating point status information, and at least part of the encoded floating point status information in the result may come from either the first floating point operand or the second floating point operand.
Abstract:
A floating point unit (FPU) for processing denormal numbers in floating point notation, a method of processing such numbers in an FPU and a computer system employing the FPU or the method. In one embodiment, the FPU includes: (1) a load unit that receives a denormal number having an exponent portion of a standard length from a source without the FPU and transforms the denormal number into a normalized number having an exponent portion of an expanded length greater than the standard length, (2) a floating point execution core, coupled to the load unit, that processes the normalized number at least once to yield a processed normalized number, the expanded length of the exponent portion allowing the processed normalized number to remain normal during processing thereof and (3) a store unit, coupled to the floating point execution core, that receives the processed normalized number and transforms the processed normalized number back into a denormal number having an exponent portion of the standard length.