Small multiplier after initial approximation for operations with increasing precision

    公开(公告)号:US11853718B2

    公开(公告)日:2023-12-26

    申请号:US18097316

    申请日:2023-01-16

    Inventor: Leonard Rarick

    Abstract: In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.

    SMALL MULTIPLIER AFTER INITIAL APPROXIMATION FOR OPERATIONS WITH INCREASING PRECISION

    公开(公告)号:US20230214186A1

    公开(公告)日:2023-07-06

    申请号:US18097316

    申请日:2023-01-16

    Inventor: Leonard Rarick

    Abstract: In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.

    AES Hardware Implementation
    3.
    发明申请

    公开(公告)号:US20170373836A1

    公开(公告)日:2017-12-28

    申请号:US15633988

    申请日:2017-06-27

    Inventor: Leonard Rarick

    Abstract: A method of performing at least one of end-to-end Advanced Encryption Standard (AES) encryption and end-to-end AES decryption in an instruction execution module comprising hardware logic in a processor having an instruction set, receives in response to a particular instruction set being executed, key values and text data identified by operands in the executed instruction, the received key values defining an initial round key and forming current key values and the received text data defining an initial state array to be processed in an initial round and forming a current state array; and for each round of a plurality of rounds ofAES encryption or decryption, modifying the current key values and modifying the current state array by: processing the current state array using at least a portion of the current key values; generating key values based upon the current key values for use in a subsequent round; and updating the current key values to replace at least a portion of the current key values with the generated key values to form a round key for use in a subsequent round.

    Rounding floating point numbers
    4.
    发明授权
    Rounding floating point numbers 有权
    舍入浮点数

    公开(公告)号:US09489174B2

    公开(公告)日:2016-11-08

    申请号:US14498183

    申请日:2014-09-26

    Inventor: Leonard Rarick

    CPC classification number: G06F7/483 G06F7/49957 G06F7/5443

    Abstract: Embodiments disclosed pertain to apparatuses, systems, and methods for floating point operations. Disclosed embodiments pertain to a circuit that is capable of processing both a normal and denormal inputs and outputting normal and denormal results, and where a rounding module is used advantageously to reduce operational latency of the circuit.

    Abstract translation: 所公开的实施例涉及用于浮点运算的装置,系统和方法。 公开的实施例涉及能够处理正常和非正常输入并输出正常和非正常结果的电路,并且其中有利地减少舍入模块以减少电路的操作等待时间。

    PERFORMING A COMPARISON COMPUTATION IN A COMPUTER SYSTEM
    5.
    发明申请
    PERFORMING A COMPARISON COMPUTATION IN A COMPUTER SYSTEM 有权
    在计算机系统中执行比较计算

    公开(公告)号:US20160041946A1

    公开(公告)日:2016-02-11

    申请号:US14452315

    申请日:2014-08-05

    Inventor: Leonard Rarick

    CPC classification number: G06F7/52 G06F7/552 G06F7/5525 G06F2207/5521

    Abstract: A method and computer system are provided for performing a comparison computation, e.g. for use in a check procedure for a reciprocal square root operation. The comparison computation compares a multiplication of three values with a predetermined value. The computer system performs the multiplication using multiplier logic which is configured to perform multiply operations in which two values are multiplied together. A first and second of the three values are multiplied to determine a first intermediate result, w1. The digits of w1 are separated into two portions, w1,1 and w1,2. The third of the three values is multiplied with w1,2 and the result is added into a multiplication of the third of the three values with w1,1 to thereby determine the result of multiplying the three values together. In this way the comparison is performed with high accuracy, whilst keeping the area and power consumption of the multiplier logic low.

    Abstract translation: 提供了一种用于执行比较计算的方法和计算机系统,例如, 用于相互平方根操作的检查过程。 比较计算将三个值的乘法与预定值进行比较。 计算机系统使用乘法器逻辑执行乘法,该乘法器配置为执行乘法运算,其中两个值相乘。 乘以三个值中的第一和第二值以确定第一中间结果w1。 w1的数字分为w1,1和w2,2两个部分。 三个值中的三个乘以w1,2,并将结果加入到具有w1,1的三个值中的三个乘法中,从而确定将三个值相乘的结果。 以这种方式,以高精度执行比较,同时保持乘法器逻辑的面积和功耗低。

    Small multiplier after initial approximation for operations with increasing precision

    公开(公告)号:US12217020B2

    公开(公告)日:2025-02-04

    申请号:US18395836

    申请日:2023-12-26

    Inventor: Leonard Rarick

    Abstract: In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.

    Check procedure for floating point operations

    公开(公告)号:US10416960B2

    公开(公告)日:2019-09-17

    申请号:US15611595

    申请日:2017-06-01

    Abstract: Method and computer system for implementing an operation on ≥1 floating point input, in accordance with a rounding mode, e.g. using a Newton-Raphson technique. The floating point result comprises a p-bit mantissa. An unrounded proposed mantissa result is determined using the Newton-Raphson technique, wherein a p-bit rounded proposed mantissa result, t, corresponds to a rounding of the unrounded proposed mantissa result in accordance with the rounding mode, with k leading zeroes. If an increment to the (m−k)th bit of the unrounded result would affect the p-bit rounded result then the input(s) and bits of the unrounded result are used to determine a check parameter which is indicative of a relationship between an exact result and the unrounded result if the (m−k)th bit were incremented. The p-bit mantissa of the floating point result, is determined in dependence upon the check parameter, to be either t or t+1.

    Check procedure for floating point operations

    公开(公告)号:US09678714B2

    公开(公告)日:2017-06-13

    申请号:US14328753

    申请日:2014-07-11

    CPC classification number: G06F7/483 G06F7/499 G06F7/535 G06F7/5525

    Abstract: Method and computer system for implementing an operation on ≧1 floating point input, in accordance with a rounding mode, e.g. using a Newton-Raphson technique. The floating point result comprises a p-bit mantissa. An unrounded proposed mantissa result is determined using the Newton-Raphson technique, wherein a p-bit rounded proposed mantissa result, t, corresponds to a rounding of the unrounded proposed mantissa result in accordance with the rounding mode, with k leading zeroes. If an increment to the (m−k)th bit of the unrounded result would affect the p-bit rounded result then the input(s) and bits of the unrounded result are used to determine a check parameter which is indicative of a relationship between an exact result and the unrounded result if the (m−k)th bit were incremented. The p-bit mantissa of the floating point result, is determined in dependence upon the check parameter, to be either t or t+1.

    UNIFIED MULTIPLY UNIT
    9.
    发明申请
    UNIFIED MULTIPLY UNIT 有权
    统一多媒体单元

    公开(公告)号:US20160188295A1

    公开(公告)日:2016-06-30

    申请号:US14584948

    申请日:2014-12-29

    Inventor: Leonard Rarick

    CPC classification number: G06F7/487

    Abstract: Embodiments disclosed pertain to apparatuses, systems, and methods for performing multi-precision single instruction multiple data (SIMD) operations on integer, fixed point and floating point operands. Disclosed embodiments pertain to a circuit that is capable of performing concurrent multiply, fused multiply-add, rounding, saturation, and dot products on the above operand types. In addition, the circuit may facilitate 64-bit multiplication when Newton-Raphson, divide and square root operations are performed.

    Abstract translation: 所公开的实施例涉及用于在整数,固定点和浮点操作数上执行多精度单指令多数据(SIMD)操作的装置,系统和方法。 公开的实施例涉及能够在上述操作数类型上执行并行乘法,融合乘法加法,舍入,饱和度和点积的电路。 此外,当执行牛顿 - 拉夫森分割和平方根操作时,电路可以促进64位乘法。

    Variable Length Execution Pipeline
    10.
    发明申请
    Variable Length Execution Pipeline 有权
    可变长度执行管道

    公开(公告)号:US20160092237A1

    公开(公告)日:2016-03-31

    申请号:US14502689

    申请日:2014-09-30

    Abstract: In an aspect, a pipelined execution resource can produce an intermediate result for use in an iterative approximation algorithm in an odd number of clock cycles. The pipelined execution resource executes SIMD requests by staggering commencement of execution of the requests from a SIMD instruction. When executing one or more operations for a SIMD iterative approximation algorithm, and an operation for another SIMD iterative approximation algorithm is ready to begin execution, control logic causes intermediate results completed by the pipelined execution resource to pass through a wait state, before being used in a subsequent computation. This wait state presents two open scheduling cycles in which both parts of the next SIMD instruction can begin execution. Although the wait state increases latency to complete an in-progress algorithm, a total throughput of execution on the pipeline increases.

    Abstract translation: 在一个方面,流水线执行资源可以在奇数个时钟周期中产生用于迭代近似算法的中间结果。 流水线执行资源通过从SIMD指令开始执行请求来执行SIMD请求。 当为SIMD迭代近似算法执行一个或多个操作时,另一个SIMD迭代逼近算法的操作准备好开始执行,控制逻辑使流水线执行资源完成的中间结果通过等待状态,然后再使用 随后的计算。 这种等待状态呈现两个开放的调度周期,其中下一个SIMD指令的两个部分都可以开始执行。 虽然等待状态增加了延迟以完成正在进行的算法,但是管道上的总执行吞吐量增加。

Patent Agency Ranking