Deep neural network hardware accelerator based on power exponential quantization

    公开(公告)号:US12154026B2

    公开(公告)日:2024-11-26

    申请号:US17284480

    申请日:2020-01-09

    Abstract: A deep neural network hardware accelerator comprises: an AXI-4 bus interface, an input cache area, an output cache area, a weighting cache area, a weighting index cache area, an encoding module, a configurable state controller module, and a PE array. The input cache area and the output cache area are designed as a line cache structure; an encoder encodes weightings according to an ordered quantization set, the quantization set storing the possible value of the absolute value of all of the weightings after quantization. During the calculation of the accelerator, the PE unit reads data from the input cache area and the weighting index cache area to perform shift calculation, and sends the calculation result to the output cache area. The accelerator uses shift operations to replace floating point multiplication operations, reducing the requirements for computing resources, storage resources, and communication bandwidth, and increasing the calculation efficiency of the accelerator.

    Execution circuitry for floating-point power operation

    公开(公告)号:US12118332B2

    公开(公告)日:2024-10-15

    申请号:US18045577

    申请日:2022-10-11

    Applicant: Apple Inc.

    CPC classification number: G06F7/552

    Abstract: Techniques are disclosed relating to dedicated power function circuitry for a floating-point power instruction. In some embodiments, execution circuitry is configured to execute a floating-point power instruction to evaluate the power function xy as 2y log2x. In some embodiments, base-2 logarithm circuitry is configured to evaluate a base-2 logarithm for a first input (e.g., log2 x) by determining coefficients for a polynomial function and evaluating the polynomial function using the determined coefficients and the first input. In some embodiments, multiplication circuitry multiplies the base-2 logarithm result by a second input to generate a multiplication result. In some embodiments, base-2 power function circuitry is configured to evaluate a base-2 power function for the multiplication result. Disclosed techniques may advantageously increase performance and reduce power consumption of floating-point power function operations with reasonable area and accuracy, relative to traditional techniques.

    HARDWARE TO PERFORM SQUARING
    3.
    发明公开

    公开(公告)号:US20240134607A1

    公开(公告)日:2024-04-25

    申请号:US18240387

    申请日:2023-08-31

    CPC classification number: G06F7/5525 G06F7/405

    Abstract: Methods of calculating a square of an input number in hardware logic are described. An m-bit number is received and Booth encoding is performed on different groups of three consecutive bits selected from the input to generate an encoded value for each of the groups. For each group, the method comprises forming a truncated string from the input number, generating an updated version of the truncated number and selecting a bit string based on the encoded value, the selected bit string comprising zeros or a left-shifted version of the updated version of the truncated number sign extended to a bit-width of 2m bits. The method further comprises combining the selected bit strings and square and sign bits for each group into an addition array; and summing the bits in the addition array.

    Small multiplier after initial approximation for operations with increasing precision

    公开(公告)号:US11853718B2

    公开(公告)日:2023-12-26

    申请号:US18097316

    申请日:2023-01-16

    Inventor: Leonard Rarick

    Abstract: In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.

    Execution unit for evaluating functions using Newton Raphson iterations

    公开(公告)号:US11847428B2

    公开(公告)日:2023-12-19

    申请号:US17660688

    申请日:2022-04-26

    Abstract: An execution unit for a processor, the execution unit comprising: a look up table having a plurality of entries, each of the plurality of entries comprising an initial estimate for a result of an operation; a preparatory circuit configured to search the look up table using an index value dependent upon the operand to locate an entry comprising a first initial estimate for a result of the operation; a plurality of processing circuits comprising at least one multiplier circuit; and control circuitry configured to provide the first initial estimate to the at least one multiplier circuit of the plurality of processing circuits so as perform processing, by the plurality of processing units, of the first initial estimate to generate the function result, said processing comprising applying one or more Newton Raphson iterations to the first initial estimate.

    Neural network learning apparatus for deep learning and method thereof

    公开(公告)号:US11803744B2

    公开(公告)日:2023-10-31

    申请号:US16730459

    申请日:2019-12-30

    CPC classification number: G06N3/08 G06F7/485 G06F7/4876 G06F7/5525

    Abstract: Disclosed is a neural network learning apparatus for deep learning and a method thereof. A neural network learning apparatus for deep learning according to an embodiment of the present disclosure includes an input interface, a memory, and a learning processor for applying a Gradient Descent algorithm to a neural network model, and the learning processor may transform a cumulative change function of the gradient for an error function into an inverse square root function in the Gradient Descent algorithm, and operate an inverse square root approximate value by using a Newton-Raphson method for the transformed inverse square root function. The neural network learning apparatus for deep learning of the present disclosure may be connected or converged with an Artificial Intelligence module, an Unmanned Aerial Vehicle (UAV), a robot, an Augmented Reality (AR) apparatus, a Virtual Reality (VR), or a 5G network service-related apparatus, etc.

    SQUARE ROOT CALCULATIONS ON AN ASSOCIATIVE PROCESSING UNIT

    公开(公告)号:US20230221925A1

    公开(公告)日:2023-07-13

    申请号:US18150317

    申请日:2023-01-05

    CPC classification number: G06F7/5525

    Abstract: A method for calculating a square root B having N bits of a number X having 2N bits includes iterating on bits bi of square root B starting from the most significant bit until the least significant bit of square root B. For each iteration, the method includes locating a 1 at the squared location of bit bi in a CHECK variable, determining the value of bit bi from the result of a comparison of number X with a function of all previously found bits and a previous comparison outcome, shifting all previously found bits right 1 location in a CHECK variable, and adding the determined value of bit bi into its squared location in the CHECK variable.

    SMALL MULTIPLIER AFTER INITIAL APPROXIMATION FOR OPERATIONS WITH INCREASING PRECISION

    公开(公告)号:US20230214186A1

    公开(公告)日:2023-07-06

    申请号:US18097316

    申请日:2023-01-16

    Inventor: Leonard Rarick

    Abstract: In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.

    Modeling expectation mismatches
    9.
    发明授权

    公开(公告)号:US11599842B2

    公开(公告)日:2023-03-07

    申请号:US17149910

    申请日:2021-01-15

    Abstract: Embodiments determine mismatches in evaluations. Embodiments receive a first evaluation of an employee from a supervisor of the employee, the first evaluation including supervisor comment ratings and supervisor numerical ratings, each of the supervisor comment ratings and supervisor numerical ratings corresponding to an evaluation category. Embodiments receive a second evaluation of the employee from the employee, the second evaluation including employee comment ratings and employee numerical ratings, each of the employee comment ratings and employee numerical ratings corresponding to the evaluation category. Embodiments determine first sentiment polarity scores of the supervisor comment ratings and second sentiment polarity scores of the employee comment ratings. Embodiments determine polarity mismatch scores based on the first sentiment polarity scores and the second sentiment polarity scores and determine average differential ratings based on the supervisor numerical ratings and the employee numerical ratings. Embodiments combine the polarity mismatch scores and the average differential ratings.

Patent Agency Ranking