Abstract:
Methods, apparatus, and articles of manufacture for performing calculations using reduced-width data are disclosed. In particular, an example method determines reduced-width data values associated with generating and evaluating functions. Some of the reduced-width data values are stored within instructions in an instruction memory during a compile phase and retrieved from instruction memory during a runtime phase.
Abstract:
An apparatus, method, and system for performing an enhanced fused multiply-add operation is disclosed. In one embodiment, an apparatus includes an exponent unit. The exponent unit includes a first adder to generate S1, where S1 is the sum of an integer k, the exponent of a floating point value A, and the exponent of a floating point value B. The exponent unit also includes a comparator to generate E1, where E1 is the greater of S1 and the exponent of a floating point value C. The apparatus also includes a partial multiplier, a shifter, and a second adder. The partial multiplier generates the partial products of the mantissas of A and B. The shifter aligns the partial products and the mantissa of C, based on E1. The second adder adds the aligned partial products and the mantissa of C. The apparatus is able to generate not only (A*B+C), but is enhanced to also be able to generate (2k*A*B+C) and the closest integer to (2k*A*B) in two's complement or floating point format.
Abstract translation:公开了一种用于执行增强的融合乘法运算的装置,方法和系统。 在一个实施例中,装置包括指数单元。 指数单元包括产生S1的第一加法器,其中S1是整数k,浮点值A的指数和浮点值B的指数之和。指数单元还包括产生E1的比较器 其中E1是S1中的较大者和浮点值C的指数。该装置还包括部分乘法器,移位器和第二加法器。 部分乘法器产生A和B的尾数的部分乘积。移位器基于E1对齐部分乘积和C的尾数。 第二个加法器将对齐的部分积和C的尾数相加。该装置能够不仅产生(A * B + C),而且能够增强也能够生成(2k * A * B + C)和 最接近的整数(2k * A * B)为二进制补码或浮点格式。
Abstract:
Methods, apparatus, and articles of manufacture for performing calculations using reduced-width data are disclosed. In particular, an example method determines reduced-width data values associated with generating and evaluating functions. Some of the reduced-width data values are stored within instructions in an instruction memory during a compile phase and retrieved from instruction memory during a runtime phase.
Abstract:
Methods, apparatus, and articles of manufacture for performing calculations using reduced-width data are disclosed. In particular, an example method determines reduced-width data values associated with generating and evaluating functions. Some of the reduced-width data values are stored within instructions in an instruction memory during a compile phase and retrieved from instruction memory during a runtime phase.
Abstract:
Methods, machines, and systems are provided for very high radix division using narrow data paths. A numerator and denominator are received for a very high radix division calculation. An approximate reciprocal of the denominator is obtained from a data structure. The numerator and denominator are pre-scaled by the reciprocal. The denominator is decomposed to an equivalent expression that results in a number of leading insignificant values. Next, modifying a current remainder by forming a first product and subtracting the equivalent expression iteratively assembles a quotient.
Abstract:
An apparatus to facilitate compute optimization is disclosed. The apparatus includes sorting logic to sort processing threads into thread groups based on bit depth of floating point thread operations.
Abstract:
Methods and apparatus to determine a remainder value are disclosed. A disclosed example method involves, during a compilation phase, causing a processor to multiply a dividend value by a first value to generate a second value associated with a product. The first value is associated with a scaled approximate reciprocal of a divisor value, and the scaled approximate reciprocal of the divisor value is determined using a compound exponent value. During a runtime phase, the processor is caused to multiply a third value from the second value. The third value is generated using at least a subset bitfield of the second value. During the runtime phase, the processor is caused to determine a remainder value based on the third value. The processor is caused to store the remainder value in a memory.
Abstract:
Methods, apparatus, and articles of manufacture for performing calculations using reduced-width data are disclosed. In particular, an example method determines reduced-width data values associated with generating and evaluating functions. Some of the reduced-width data values are stored within instructions in an instruction memory during a compile phase and retrieved from instruction memory during a runtime phase.
Abstract:
Methods and apparatus for determining approximating polynomials using instruction-embedded coefficients are disclosed. In particular, the methods and apparatus use a plurality of coefficient values stored in a plurality of instructions. The coefficient values are associated with a runtime approximating polynomial of a K-th root family function. The coefficient values and the instructions stored in an instruction memory enable the processor system to determine a K-th root family function approximation value based on the runtime approximating polynomial.
Abstract:
Methods and apparatus for determining a remainder value are disclosed. The methods and apparatus extract a residuary subset bitfield value from a binary value that is calculated using a scaled approximate reciprocal value that is associated with a compound exponent scaling value. The residuary subset bitfield value is part of a range of contiguous bits that is associated with upper and lower boundary bit-position values that are part of the compound exponent scaling value. The methods and apparatus determine the remainder value based on the residuary subset bitfield value.