Method and apparatus for square root generation using bit manipulation and instruction interleaving
    1.
    发明授权
    Method and apparatus for square root generation using bit manipulation and instruction interleaving 有权
    使用位操作和指令交错的平方根生成方法和装置

    公开(公告)号:US06625632B1

    公开(公告)日:2003-09-23

    申请号:US09536836

    申请日:2000-03-27

    申请人: Valeri Kotlov

    发明人: Valeri Kotlov

    IPC分类号: G06F7552

    CPC分类号: G06F7/5525 G06F9/30014

    摘要: The invention provides improved methods and systems for generation of square roots of vector and administrative operands. The methods utilize bit-manipulation operations to halve intermediate values, generated by a processor reciprocal square root operation, during a multistep process square root determination. Such methods can also multiply an original operand (whose square root is being determined) with such an intermediate value, e.g., or a halved or other value thereon. The invention also provides methods and apparatus for determination of square roots square roots of large groups of numbers by interleaving vector and administrative instructions to take advantage of necessary delays in the vector processing pipeline architecture to speed overall processing.

    摘要翻译: 本发明提供了用于生成向量和行政操作数的平方根的改进的方法和系统。 该方法利用位操纵操作,在多步骤平方根确定期间将由处理器倒数平方根操作生成的中间值减半。 这样的方法还可以将原始操作数(其正平方根正在确定)与这样的中间值(例如,其上的一半或其他值)相乘。 本发明还提供了用于通过交织向量和管理指令来确定大量数字的平方根的方法和装置,以利用矢量处理流水线架构中的必要延迟来加速整体处理。

    Method and apparatus for calculating a power of an operand
    2.
    发明授权
    Method and apparatus for calculating a power of an operand 有权
    用于计算操作数的功率的方法和装置

    公开(公告)号:US06381625B2

    公开(公告)日:2002-04-30

    申请号:US09782474

    申请日:2001-02-12

    IPC分类号: G06F7552

    摘要: A multiplier capable of performing signed and unsigned scalar and vector multiplication is disclosed. The multiplier is configured to receive signed or unsigned multiplier and multiplicand operands in scalar or packed vector form. An effective sign for the multiplier and multiplicand operands may be calculated and used to create and select a number of partial products according to Booth's algorithm. Once the partial products have been created and selected, they may be summed and the results may be output. The results may be signed or unsigned, and may represent vector or scalar quantities. When a vector multiplication is performed, the multiplier may be configured to generate and select partial products so as to effectively isolate the multiplication process for each pair of vector components. The multiplier may also be configured to sum the products of the vector components to form the vector dot product. The final product may be output in segments so as to require fewer bus lines. The segments may be rounded by adding a rounding constant. Rounding and normalization may be performed in two paths, one assuming an overflow will occur, the other assuming no overflow will occur. The multiplier may also be configured to perform iterative calculations to evaluate constant powers of an operand. Intermediate products that are formed may be rounded and normalized in two paths and then compressed and stored for use in the next iteration. An adjustment constant may also be added to increase the frequency of exactly rounded results.

    摘要翻译: 公开了能够执行有符号和无符号标量和矢量乘法的乘法器。 乘法器配置为以标量或压缩向量形式接收带符号或无符号乘数和被乘数操作数。 可以计算乘数和被乘数操作数的有效符号,并用于根据布斯算法创建和选择多个部分乘积。 一旦创建并选择了部分产品,就可以对它们进行求和并输出结果。 结果可能是有符号或无符号的,可能表示向量或标量。 当执行向量乘法时,乘法器可以被配置为产生和选择部分乘积,以便有效地隔离每对向量分量的乘法过程。 乘法器还可以被配置为对矢量分量的乘积求和以形成向量点积。 最终产品可以分段输出,以便需要更少的总线。 可以通过添加舍入常数来对段进行舍入。 可以在两个路径中执行舍入和归一化,一个假设将发生溢出,另一个假设不会发生溢出。 乘法器还可以被配置为执行迭代计算以评估操作数的恒定功率。 形成的中间产品可以在两个路径中进行圆化和归一化,然后压缩并存储以用于下一次迭代。 还可以添加调整常数以增加精确舍入结果的频率。

    Fast calculation of (A/B)K by a parallel floating-point processor
    3.
    发明授权
    Fast calculation of (A/B)K by a parallel floating-point processor 失效
    通过并行浮点处理器快速计算(A / B)K

    公开(公告)号:US06598063B1

    公开(公告)日:2003-07-22

    申请号:US09638442

    申请日:2000-08-14

    IPC分类号: G06F7552

    CPC分类号: G06F7/552 G06F7/548

    摘要: A method suitable for calculating an expression having the form (A/B)K by a processor that features separate sets of floating point units which can operate in parallel for greater speed of execution. The processor issues instructions to determine an approximate reciprocal R0 of a first variable B. Further instructions are issued to raise a second variable to the power of a third variable K by a first set of arithmetic units of the processor, where the second variable is a function of the approximate reciprocal R0. Still further instructions are issued to calculate a polynomial q at a fourth variable delta by a second set of arithmetic units of the processor. The fourth variable delta is also a function of the approximate reciprocal R0. Finally, one or more instructions are issued to multiply the calculated polynomial by the second variable, having been raised to the power of the third variable, to yield (A/B)K.

    摘要翻译: 一种适用于通过处理器计算具有形式(A / B)K的表达式的方法,所述处理器具有独立的浮点单元组,其可以并行操作以获得更高的执行速度。 处理器发出指令以确定第一变量B的近似倒数R0。发出另外的指令以通过处理器的第一组算术单元将第二变量升高到第三变量K的功率,其中第二变量为 函数近似相等的R0。 还发出另外的指令,以通过处理器的第二组运算单元计算第四变量增量的多项式q。 第四个可变增量也是近似倒数R0的函数。 最后,发出一个或多个指令,将计算出的多项式乘以已经被提高到第三个变量的幂的第二个变量,以产生(A / B)K。

    Computer method and apparatus for division and square root operations using signed digit
    4.
    发明授权
    Computer method and apparatus for division and square root operations using signed digit 有权
    使用有符号数字的分割和平方根操作的计算机方法和装置

    公开(公告)号:US06779012B2

    公开(公告)日:2004-08-17

    申请号:US10419454

    申请日:2003-04-18

    IPC分类号: G06F7552

    摘要: Computer method and apparatus for performing a square root or division operation generating a root or quotient. A partial remainder is stored in radix-2 or radix-4 signed digit format. A decoder is provided for computing a root or quotient digit, and a correction term dependent on a number of the most significant digits of the partial remainder. An adder is provided for computing the sum of the signed digit partial remainder and the correction term in binary format, and providing the result in signed digit format. The adder computes a carry out independent of a carry in bit and a sum dependent on a Carry_in bit providing a fast adder independent of carry propagate delays. The scaler performs a multiplication by two of the result output from the adder in signed digit format to provide a signed digit next partial remainder.

    摘要翻译: 用于执行产生根或商的平方根或除法运算的计算机方法和装置。 部分余数以radix-2或radix-4有符号数字格式存储。 提供用于计算根数或商数的解码器,以及取决于部分余数的最高有效数字的数量的校正项。 提供加法器,用于计算二进制格式的有符号位部分余数和校正项的和,并以带符号数字格式提供结果。 加法器计算独立于比特进位的进位和取决于提供独立于进位传播延迟的快速加法器的Carry_in位的和。 缩放器执行乘法运算结果从加法器输出的两个符号数字格式,以提供一个有符号数字的下一个部分余数。

    Method and circuit for envelope detection using a peel cone approximation
    5.
    发明授权
    Method and circuit for envelope detection using a peel cone approximation 有权
    使用剥离锥近似的包络检测方法和电路

    公开(公告)号:US06553399B1

    公开(公告)日:2003-04-22

    申请号:US09481141

    申请日:2000-01-12

    IPC分类号: G06F7552

    CPC分类号: G06F7/552 G06F2207/5525

    摘要: The present invention discloses an envelope detection circuit by using a peel cone approximation concept. The envelope detection circuit comprises an absolute value comparision mechanism, a read only memory and a multiplier/adder mechanism. Particularly, the present invention uses a divider to generate an address of the read only memory to obtain less error and less hardware cost.

    摘要翻译: 本发明公开了一种使用剥离锥近似概念的包络检测电路。 包络检测电路包括绝对值比较机制,只读存储器和乘法器/加法器机构。 特别地,本发明使用分频器来产生只读存储器的地址,以获得更少的错误并且更少的硬件成本。

    Single precision inverse square root generator

    公开(公告)号:US06654777B1

    公开(公告)日:2003-11-25

    申请号:US09627221

    申请日:2000-07-27

    IPC分类号: G06F7552

    摘要: A floating point inverse square root circuit is disclosed. The circuit is configured to receive a floating point value comprised of a sign bit, an exponent field, and a mantissa field. The inverse square root circuit includes a lookup table configured to receive at least a portion of the floating point value and further configured to generate an initial approximation (x0) of the inverse square root of the floating point value from the received portion of the floating point value. The inverse square root circuit further includes a first estimation circuit that receives the initial approximation from the lookup table and at least a portion of a value L derived from the floating point value mantissa field (M) and further configured to produce a first approximation (x1) of the floating point value's inverse square root based upon L and x0 where x1 is a more accurate estimate of the inverse square root than x0. The first estimation circuit may include first, second, and third fixed point multiplication units and first and second fixed point adders where the first multiplication unit is configured to square the initial approximation x0, the first fixed point adder is configured to receive as its inputs the initial approximation x0 and the output of a first shift register that receives the initial approximation x0 as its input, and the second multiplication unit is configured to multiply the output of the first multiplication unit by the initial approximation x0. The third multiplication unit may be configured to multiply the output of the second multiplication unit by L and the second adder may be configured to add the output of the first adder with a shifted and 2's complemented version of the output of the third multiplier to produce the first approximation x1. The value L may comprise the normalized mantissa field if the exponent of the floating point value is odd and two times the normalized mantissa field if the exponent of the floating point value is even.

    Floating point square root and reciprocal square root computation unit in a processor
    7.
    发明授权
    Floating point square root and reciprocal square root computation unit in a processor 有权
    处理器中的浮点平方根和倒数平方根计算单元

    公开(公告)号:US06349319B1

    公开(公告)日:2002-02-19

    申请号:US09240765

    申请日:1999-01-29

    IPC分类号: G06F7552

    摘要: A method of computing a square root or a reciprocal square root of a number in a computing device uses a piece-wise quadratic approximation of the number. The square root computation uses the piece-wise quadratic approximation in the form: squareroot(X)={overscore (A)}ix2+{overscore (B)}ix+{overscore (C)}i, in each interval i. The reciprocal square root computation uses the piece-wise quadratic approximation in the form: 1/squareroot(X)=Aix2+Bix+Ci, in each interval i. The coefficients {overscore (A)}i, {overscore (B)}i, and {overscore (C)}i, and Ai, Bi, and Ci are derived for the square root operation and for the reciprocal square root operation to reduce the least mean square error using a least squares approximation of a plurality of equally-spaced points within an interval. In one embodiment, 256 equally-spaced intervals are defined to represent the 23 bits of the mantissa. The coefficients are stored in a storage and accessed during execution of the square root or reciprocal square root computation instruction.

    摘要翻译: 在计算装置中计算数字的平方根或倒数平方根的方法使用数字的分段二次近似。 平方根计算使用分段二次近似形式:在每个间隔i。 互逆平方根计算使用分段二次逼近形式:在每个间隔i。 导出了平方根运算和倒数平方根运算的系数{overscore(A)} i,{overscore(B)} i和{overscore(C)} i以及Ai,Bi和Ci 使用在间隔内的多个等间隔点的最小平方近似的最小均方误差。 在一个实施例中,256个等间隔的间隔被定义为表示尾数的23位。 系数存储在存储器中,并在执行平方根或倒数平方根计算指令期间访问。