Performing a comparison computation in a computer system

    公开(公告)号:US10037191B2

    公开(公告)日:2018-07-31

    申请号:US15874642

    申请日:2018-01-18

    Inventor: Leonard Rarick

    CPC classification number: G06F7/52 G06F7/552 G06F7/5525 G06F2207/5521

    Abstract: A method and computer system are provided for performing a comparison computation, e.g. for use in a check procedure for a reciprocal square root operation. The comparison computation compares a multiplication of three values with a predetermined value. The computer system performs the multiplication using multiplier logic which is configured to perform multiply operations in which two values are multiplied together. A first and second of the three values are multiplied to determine a first intermediate result, w1. The digits of w1 are separated into two portions, w1,1 and w1,2. The third of the three values is multiplied with w1,2 and the result is added into a multiplication of the third of the three values with w1,1 to thereby determine the result of multiplying the three values together. In this way the comparison is performed with high accuracy, while keeping the area and power consumption of the multiplier logic low.

    Variable Length Execution Pipeline
    12.
    发明申请

    公开(公告)号:US20170102942A1

    公开(公告)日:2017-04-13

    申请号:US15385544

    申请日:2016-12-20

    Abstract: In an aspect, a pipelined execution resource can produce an intermediate result for use in an iterative approximation algorithm in an odd number of clock cycles. The pipelined execution resource executes SIMD requests by staggering commencement of execution of the requests from a SIMD instruction. When executing one or more operations for a SIMD iterative approximation algorithm, and an operation for another SIMD iterative approximation algorithm is ready to begin execution, control logic causes intermediate results completed by the pipelined execution resource to pass through a wait state, before being used in a subsequent computation. This wait state presents two open scheduling cycles in which both parts of the next SIMD instruction can begin execution. Although the wait state increases latency to complete an in-progress algorithm, a total throughput of execution on the pipeline increases.

    Implementing a square root operation in a computer system

    公开(公告)号:US09612800B2

    公开(公告)日:2017-04-04

    申请号:US14452358

    申请日:2014-08-05

    Inventor: Leonard Rarick

    CPC classification number: G06F7/5525 G06F2207/5355

    Abstract: A method and computer system are provided for implementing a square root operation using an iterative converging approximation technique. The method includes fewer computations than conventional methods, and only includes computations which are simple to implement in hardware on a computer system, such as multiplication, addition, subtraction and shifting. Therefore, the methods described herein are adapted specifically for being performed on a computer system, e.g. in hardware, and allow the computer system to perform a square root operation with low latency and with low power consumption.

    ROUNDING FLOATING POINT NUMBERS
    14.
    发明申请
    ROUNDING FLOATING POINT NUMBERS 有权
    圆形浮动点数

    公开(公告)号:US20160092167A1

    公开(公告)日:2016-03-31

    申请号:US14498183

    申请日:2014-09-26

    Inventor: Leonard Rarick

    CPC classification number: G06F7/483 G06F7/49957 G06F7/5443

    Abstract: Embodiments disclosed pertain to apparatuses, systems, and methods for floating point operations. Disclosed embodiments pertain to a circuit that is capable of processing both a normal and denormal inputs and outputting normal and denormal results, and where a rounding module is used advantageously to reduce operational latency of the circuit.

    Abstract translation: 所公开的实施例涉及用于浮点运算的装置,系统和方法。 公开的实施例涉及能够处理正常和非正常输入并输出正常和非正常结果的电路,并且其中有利地减少舍入模块以减少电路的操作等待时间。

    SMALL MULTIPLIER AFTER INITIAL APPROXIMATION FOR OPERATIONS WITH INCREASING PRECISION

    公开(公告)号:US20240241695A1

    公开(公告)日:2024-07-18

    申请号:US18395836

    申请日:2023-12-26

    Inventor: Leonard Rarick

    Abstract: In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.

    Small multiplier after initial approximation for operations with increasing precision

    公开(公告)号:US11579844B2

    公开(公告)日:2023-02-14

    申请号:US17209809

    申请日:2021-03-23

    Inventor: Leonard Rarick

    Abstract: In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.

    Unified multiply unit
    17.
    发明授权

    公开(公告)号:US10255041B2

    公开(公告)日:2019-04-09

    申请号:US15621388

    申请日:2017-06-13

    Inventor: Leonard Rarick

    Abstract: Embodiments disclosed pertain to apparatuses, systems, and methods for performing multi-precision single instruction multiple data (SIMD) operations on integer, fixed point and floating point operands. Disclosed embodiments pertain to a circuit that is capable of performing concurrent multiply, fused multiply-add, rounding, saturation, and dot products on the above operand types. In addition, the circuit may facilitate 64-bit multiplication when Newton-Raphson, divide and square root operations are performed.

    Rounding floating point numbers
    18.
    发明授权

    公开(公告)号:US10146503B2

    公开(公告)日:2018-12-04

    申请号:US15292368

    申请日:2016-10-13

    Inventor: Leonard Rarick

    Abstract: Embodiments disclosed pertain to apparatuses, systems, and methods for floating point operations. Disclosed embodiments pertain to a circuit that is capable of processing both a normal and denormal inputs and outputting normal and denormal results, and where a rounding module is used advantageously to reduce operational latency of the circuit.

    Variable length execution pipeline
    19.
    发明授权

    公开(公告)号:US09996345B2

    公开(公告)日:2018-06-12

    申请号:US15385544

    申请日:2016-12-20

    Abstract: In an aspect, a pipelined execution resource can produce an intermediate result for use in an iterative approximation algorithm in an odd number of clock cycles. The pipelined execution resource executes SIMD requests by staggering commencement of execution of the requests from a SIMD instruction. When executing one or more operations for a SIMD iterative approximation algorithm, and an operation for another SIMD iterative approximation algorithm is ready to begin execution, control logic causes intermediate results completed by the pipelined execution resource to pass through a wait state, before being used in a subsequent computation. This wait state presents two open scheduling cycles in which both parts of the next SIMD instruction can begin execution. Although the wait state increases latency to complete an in-progress algorithm, a total throughput of execution on the pipeline increases.

    Unified Multiply Unit
    20.
    发明申请

    公开(公告)号:US20170277514A1

    公开(公告)日:2017-09-28

    申请号:US15621388

    申请日:2017-06-13

    Inventor: Leonard Rarick

    CPC classification number: G06F7/487

    Abstract: Embodiments disclosed pertain to apparatuses, systems, and methods for performing multi-precision single instruction multiple data (SIMD) operations on integer, fixed point and floating point operands. Disclosed embodiments pertain to a circuit that is capable of performing concurrent multiply, fused multiply-add, rounding, saturation, and dot products on the above operand types. In addition, the circuit may facilitate 64-bit multiplication when Newton-Raphson, divide and square root operations are performed.

Patent Agency Ranking