摘要:
There is disclosed method, software and apparatus for evaluating a function f in a computing device using a reduction, core approximation and final reconstruction stage. According to one embodiment of the invention, an argument reduction stage uses an approximate reciprocal table in the computing device. According to another embodiment, an approximate reciprocal instruction I is operative on the computing device to use the approximate reciprocal table such that the argument reduction stage provides that—C:=I(X) and R:=X×C−1, the core approximation stage provides that p(R) is computed so that it approximates f(1+R), and the final reconstruction stage provides that T=f(1/C) is fetched and calculated if necessary, and f(X) is reconstructed based on f(X)=f([1/C]×[X×C])=g(f(1/C), f(1+R)).
摘要:
A computer and a method of using the computer to reduce an original argument to obtain a periodic function of the argument. A special number P.sub.j is employed that is close to a nontrivial even-integral multiple .pi.. The technique subtracts a non-negative integral multiple of P.sub.j from the original argument to obtain a first reduced argument. Then, a second non-negative integer multiple of a floating-point representation of .pi./2 is subtracted from the first reduced argument to obtain a second reduced argument. Next, a periodic function of a third argument equal to a sum of the second reduced argument plus the product of the first non-negative integral multiple and a floating-point representation of an offset .delta..sub.j is evaluated to obtain a result.
摘要:
An apparatus and method for creating lookup tables of approximate floating-point quotients which exactly represent the underlying value, within the range of the specified precision. The lookup tables are created without any extraneous data beyond what is needed and also without sacrificing numerical accuracy, and may be creating for any radix.
摘要:
A method suitable for calculating an expression having the form (A/B)K by a processor that features separate sets of floating point units which can operate in parallel for greater speed of execution. The processor issues instructions to determine an approximate reciprocal R0 of a first variable B. Further instructions are issued to raise a second variable to the power of a third variable K by a first set of arithmetic units of the processor, where the second variable is a function of the approximate reciprocal R0. Still further instructions are issued to calculate a polynomial q at a fourth variable delta by a second set of arithmetic units of the processor. The fourth variable delta is also a function of the approximate reciprocal R0. Finally, one or more instructions are issued to multiply the calculated polynomial by the second variable, having been raised to the power of the third variable, to yield (A/B)K.
摘要:
Embodiments of techniques and systems for approximating a function are described. In embodiments, a computing device may receive one or more statistical properties associated with application of an approximation function of a function over a target domain. The computing device may formulate one or more constraints on one or more parameters of a functional form of the approximation function, based at least in part on the one or more statistical properties. The computing device may then determine the one or more parameters subject to the constraints and out put results of the determination. In embodiments, the one or more parameters may be determined through application of an optimization procedure. Other embodiments, may be described and claimed.
摘要:
A new function for calculating the reciprocal residual of a floating-point number X is defined as recip_residual(X)=1−X*recip(X), where recip(X) represents the reciprocal of X. The function may be implemented using a fused multiply-add unit in a processor. The reciprocal value of X, recip(X), may be obtained from a lookup table. The recip_residual function may help reduce the latency of many multiplicative functions that are based on products of multiple numbers and can be expressed in simple terms of functions on each individual number (e.g., log(U*V)=log(U)+log(V)).
摘要:
Apparatus and methods are provided for an improved on-the-fly rounding technique for digit-recurrence algorithms, such as division and square root calculations. According to one embodiment, only two forms of an intermediate result of an operation to be performed by a digit-recurrence algorithm are maintained. A first form is maintained in a first register and a second form is maintained in a second register. Responsive to receiving digits 1 to L−2 of the intermediate result from a digit recurrence unit, where L represents a number of digits that satisfies a predetermined precision for the operation, both forms of the intermediate result are updated by register swapping or concatenation under the control of load and shift control logic and on-the-fly conversion logic. Then, a rounded result is generated by determining digits dL−1 and dL and appending a rounded last digit to the appropriate form of the intermediate result.
摘要:
A computer and a method of using the computer to separate a floating-point number into high and low parts and for evaluating a dominant arithmetic object and a remainder object. The dominant object is associated with the first arithmetic object by using the high parts of the floating-point number. The evaluation of a remainder arithmetic object associates the first arithmetic object with the high and low parts of the floating-point numbers. A sum of the dominant and remainder arithmetic objects returns a value corresponding to the first arithmetic object.
摘要:
Fourier transform computation for distributed processing environments is disclosed. Example methods disclosed herein to compute a Fourier transform of an input data sequence include performing first processing on the input data sequence using a plurality of processors, the first processing resulting in an output data sequence having more data elements than the input data sequence Such example methods also include performing second processing on the output data sequence using the plurality of processors, the output data sequence being permutated among the plurality of processors, each of the processors performing the second processing on a respective permutated portion of the output data sequence to determine a respective, ordered segment of the Fourier transform of the input data sequence.
摘要:
A new function for calculating the reciprocal residual of a floating-point number X is defined as recip_residual(X)=1−X*recip(X), where recip(X) represents the reciprocal of X. The function may be implemented using a fused multiply-add unit in a processor. The reciprocal value of X, recip(X), may be obtained from a lookup table. The recip_residual function may help reduce the latency of many multiplicative functions that are based on products of multiple numbers and can be expressed in simple terms of functions on each individual number (e.g., log(U*V)=log(U)+log(V)).