Execution circuitry for floating-point power operation

    公开(公告)号:US12118332B2

    公开(公告)日:2024-10-15

    申请号:US18045577

    申请日:2022-10-11

    Applicant: Apple Inc.

    CPC classification number: G06F7/552

    Abstract: Techniques are disclosed relating to dedicated power function circuitry for a floating-point power instruction. In some embodiments, execution circuitry is configured to execute a floating-point power instruction to evaluate the power function xy as 2y log2x. In some embodiments, base-2 logarithm circuitry is configured to evaluate a base-2 logarithm for a first input (e.g., log2 x) by determining coefficients for a polynomial function and evaluating the polynomial function using the determined coefficients and the first input. In some embodiments, multiplication circuitry multiplies the base-2 logarithm result by a second input to generate a multiplication result. In some embodiments, base-2 power function circuitry is configured to evaluate a base-2 power function for the multiplication result. Disclosed techniques may advantageously increase performance and reduce power consumption of floating-point power function operations with reasonable area and accuracy, relative to traditional techniques.

    Small multiplier after initial approximation for operations with increasing precision

    公开(公告)号:US11853718B2

    公开(公告)日:2023-12-26

    申请号:US18097316

    申请日:2023-01-16

    Inventor: Leonard Rarick

    Abstract: In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.

    SMALL MULTIPLIER AFTER INITIAL APPROXIMATION FOR OPERATIONS WITH INCREASING PRECISION

    公开(公告)号:US20230214186A1

    公开(公告)日:2023-07-06

    申请号:US18097316

    申请日:2023-01-16

    Inventor: Leonard Rarick

    Abstract: In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.

    Cubic root of a galois field element

    公开(公告)号:US09804828B2

    公开(公告)日:2017-10-31

    申请号:US14551110

    申请日:2014-11-24

    Applicant: APPLE INC.

    CPC classification number: G06F7/724 G06F7/552 G06F7/5525 G06F2207/5526

    Abstract: A method includes receiving a first element of a Galois Field of order qm, where q is a prime number and m is a positive integer. The first element is raised to a predetermined power so as to form a second element z, wherein the predetermined power is a function of qm and an integer p, where p is a prime number which divides qm−1. The second element z is raised to a pth power to form a third element. If the third element equals the first element, the second element multiplied by a pth root of unity raised to a respective power selected from a set of integers between 0 and p−1 is output as at least one root of the first element.

    SYSTEM AND METHOD FOR ROUNDING RECIPROCAL SQUARE ROOT RESULTS OF INPUT FLOATING POINT NUMBERS

    公开(公告)号:US20170109134A1

    公开(公告)日:2017-04-20

    申请号:US15293541

    申请日:2016-10-14

    Abstract: Methods and systems for determining whether an infinitely precise result of a reciprocal square root operation performed on an input floating point number is greater than a particular number in a first floating point precision. The method includes calculating the square of the particular number in a second lower floating point precision; calculating an error in the calculated square due to the second floating point precision; calculating a first delta value in the first floating point precision by calculating the square multiplied by the input floating point number less one; calculating a second delta value by calculating the error multiplied by the input floating point number plus the first delta value; and outputting an indication of whether the infinitely precise result of the reciprocal square root operation is greater than the particular number based on the second delta term.

    PERFORMING A COMPARISON COMPUTATION IN A COMPUTER SYSTEM
    7.
    发明申请
    PERFORMING A COMPARISON COMPUTATION IN A COMPUTER SYSTEM 有权
    在计算机系统中执行比较计算

    公开(公告)号:US20160041946A1

    公开(公告)日:2016-02-11

    申请号:US14452315

    申请日:2014-08-05

    Inventor: Leonard Rarick

    CPC classification number: G06F7/52 G06F7/552 G06F7/5525 G06F2207/5521

    Abstract: A method and computer system are provided for performing a comparison computation, e.g. for use in a check procedure for a reciprocal square root operation. The comparison computation compares a multiplication of three values with a predetermined value. The computer system performs the multiplication using multiplier logic which is configured to perform multiply operations in which two values are multiplied together. A first and second of the three values are multiplied to determine a first intermediate result, w1. The digits of w1 are separated into two portions, w1,1 and w1,2. The third of the three values is multiplied with w1,2 and the result is added into a multiplication of the third of the three values with w1,1 to thereby determine the result of multiplying the three values together. In this way the comparison is performed with high accuracy, whilst keeping the area and power consumption of the multiplier logic low.

    Abstract translation: 提供了一种用于执行比较计算的方法和计算机系统,例如, 用于相互平方根操作的检查过程。 比较计算将三个值的乘法与预定值进行比较。 计算机系统使用乘法器逻辑执行乘法,该乘法器配置为执行乘法运算,其中两个值相乘。 乘以三个值中的第一和第二值以确定第一中间结果w1。 w1的数字分为w1,1和w2,2两个部分。 三个值中的三个乘以w1,2,并将结果加入到具有w1,1的三个值中的三个乘法中,从而确定将三个值相乘的结果。 以这种方式,以高精度执行比较,同时保持乘法器逻辑的面积和功耗低。

    Computing floating-point polynomials in an integrated circuit device
    8.
    发明授权
    Computing floating-point polynomials in an integrated circuit device 有权
    计算集成电路器件中的浮点多项式

    公开(公告)号:US09053045B1

    公开(公告)日:2015-06-09

    申请号:US13789882

    申请日:2013-03-08

    Abstract: Polynomial circuitry for calculating a polynomial having terms including powers of an input variable, where the input variable is represented by a mantissa and an exponent, includes at least one respective coefficient table for each respective term, each respective coefficient table being loaded with a plurality of respective instances of a coefficient for said respective term, each respective instance being shifted by a different number of bits. The circuitry also includes decoder circuitry for selecting one of the respective instances of the coefficient for each respective term based on the exponent and on a range, from among a plurality of ranges, of values into which that input variable falls.

    Abstract translation: 用于计算具有包括输入变量的功率(其中输入变量由尾数和指数表示)的项的多项式的多项式电路包括用于每个相应术语的至少一个相应系数表,每个相应的系数表加载有多个 用于所述各个术语的系数的各个实例,每个相应的实例被移位不同的比特数。 该电路还包括解码器电路,用于基于指数以及多个范围中的该输入变量所落下的值的范围,为每个相应术语选择系数的相应实例之一。

    ARITHMETIC OPERATION DEVICE, CONTROL METHOD, AND PROGRAM
    9.
    发明申请
    ARITHMETIC OPERATION DEVICE, CONTROL METHOD, AND PROGRAM 有权
    算术运算装置,控制方法和程序

    公开(公告)号:US20140365546A1

    公开(公告)日:2014-12-11

    申请号:US14366129

    申请日:2013-02-15

    Abstract: Provided is an arithmetic operation device including a plurality of shift registers each constituted by first to (N+1)th registers and a control unit configured to cause the shift registers to move stored values. The control unit causes the stored values to be output from a predetermined pair of registers constituting the first shift register while causing the stored values to move so that all combinations of a pair of stored values selectable from the stored values are output, and causes the stored values to be output from a predetermined pair of registers constituting the other shift register while causing the stored values to move.

    Abstract translation: 提供了一种算术运算装置,包括由第一至第(N + 1)个寄存器构成的多个移位寄存器和被配置为使移位寄存器移动存储值的控制单元。 控制单元使得存储的值从构成第一移位寄存器的预定寄存器对输出,同时使存储的值移动,从而输出从存储值可选择的一对存储值的所有组合,并且使存储的值 从构成另一移位寄存器的预定寄存器组输出值,同时使存储的值移动。

    Specialized processing block for programmable integrated circuit device
    10.
    发明授权
    Specialized processing block for programmable integrated circuit device 有权
    可编程集成电路器件专用处理块

    公开(公告)号:US08543634B1

    公开(公告)日:2013-09-24

    申请号:US13435133

    申请日:2012-03-30

    CPC classification number: G06F7/552 G06F2207/5523

    Abstract: A specialized processing block such as a DSP block may be enhanced by including direct connections that allow the block output to be directly connected to either the multiplier inputs or the adder inputs of another such block. A programmable integrated circuit device may includes a plurality of such specialized processing blocks. The specialized processing block includes a multiplier having two multiplicand inputs and a product output, an adder having as one adder input the product output of the multiplier, and having a second adder input and an adder output, a direct-connect output of the adder output to a first other one of the specialized processing block, and a direct-connect input from a second other one of the specialized processing block. The direct-connect input connects a direct-connect output of that second other one of the specialized processing block to a first one of the multiplicand inputs.

    Abstract translation: 可以通过包括允许块输出直接连接到另一个这样的块的乘法器输入或加法器输入的直接连接来增强诸如DSP块的专门处理块。 可编程集成电路设备可以包括多个这样的专用处理块。 专用处理块包括具有两个被乘数输入和乘积输出的乘法器,具有一个加法器的加法器输入乘法器的乘积输出,并具有第二加法器输入和加法器输出,加法器输出的直接连接输出 到专用处理块中的第一另一个,以及来自专门处理块中的另一个的直接连接输入。 直接连接输入将专用处理块中另外另一个的直接连接输出连接到被乘数输入中的第一个。

Patent Agency Ranking