-
公开(公告)号:US12154026B2
公开(公告)日:2024-11-26
申请号:US17284480
申请日:2020-01-09
Applicant: SOUTHEAST UNIVERSITY
Inventor: Shengli Lu , Wei Pang , Ruili Wu , Yingbo Fan , Hao Liu , Cheng Huang
Abstract: A deep neural network hardware accelerator comprises: an AXI-4 bus interface, an input cache area, an output cache area, a weighting cache area, a weighting index cache area, an encoding module, a configurable state controller module, and a PE array. The input cache area and the output cache area are designed as a line cache structure; an encoder encodes weightings according to an ordered quantization set, the quantization set storing the possible value of the absolute value of all of the weightings after quantization. During the calculation of the accelerator, the PE unit reads data from the input cache area and the weighting index cache area to perform shift calculation, and sends the calculation result to the output cache area. The accelerator uses shift operations to replace floating point multiplication operations, reducing the requirements for computing resources, storage resources, and communication bandwidth, and increasing the calculation efficiency of the accelerator.
-
公开(公告)号:US12118332B2
公开(公告)日:2024-10-15
申请号:US18045577
申请日:2022-10-11
Applicant: Apple Inc.
Inventor: Ali Sazegari , Segev Elmalem , O-Cheng Chang , Jingwei Zhang , Ido Soffair , Aaftab A. Munshi
IPC: G06F7/552
CPC classification number: G06F7/552
Abstract: Techniques are disclosed relating to dedicated power function circuitry for a floating-point power instruction. In some embodiments, execution circuitry is configured to execute a floating-point power instruction to evaluate the power function xy as 2y log2x. In some embodiments, base-2 logarithm circuitry is configured to evaluate a base-2 logarithm for a first input (e.g., log2 x) by determining coefficients for a polynomial function and evaluating the polynomial function using the determined coefficients and the first input. In some embodiments, multiplication circuitry multiplies the base-2 logarithm result by a second input to generate a multiplication result. In some embodiments, base-2 power function circuitry is configured to evaluate a base-2 power function for the multiplication result. Disclosed techniques may advantageously increase performance and reduce power consumption of floating-point power function operations with reasonable area and accuracy, relative to traditional techniques.
-
公开(公告)号:US20240134607A1
公开(公告)日:2024-04-25
申请号:US18240387
申请日:2023-08-31
Applicant: Imagination Technologies Limited
Inventor: Max Freiburghaus , William Wheeler , Daniel Ley
CPC classification number: G06F7/5525 , G06F7/405
Abstract: Methods of calculating a square of an input number in hardware logic are described. An m-bit number is received and Booth encoding is performed on different groups of three consecutive bits selected from the input to generate an encoded value for each of the groups. For each group, the method comprises forming a truncated string from the input number, generating an updated version of the truncated number and selecting a bit string based on the encoded value, the selected bit string comprising zeros or a left-shifted version of the updated version of the truncated number sign extended to a bit-width of 2m bits. The method further comprises combining the selected bit strings and square and sign bits for each group into an addition array; and summing the bits in the addition array.
-
公开(公告)号:US11853718B2
公开(公告)日:2023-12-26
申请号:US18097316
申请日:2023-01-16
Applicant: Imagination Technologies Limited
Inventor: Leonard Rarick
CPC classification number: G06F7/552 , G06F7/523 , G06F7/537 , G06F7/5525 , G06F2207/5351
Abstract: In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.
-
公开(公告)号:US11847428B2
公开(公告)日:2023-12-19
申请号:US17660688
申请日:2022-04-26
Applicant: Graphcore Limited
Inventor: Jonathan Mangnall , Stephen Felix
CPC classification number: G06F7/483 , G06F1/03 , G06F7/535 , G06F7/5525 , G06F2101/12 , G06F2207/5355 , G06F2207/5356
Abstract: An execution unit for a processor, the execution unit comprising: a look up table having a plurality of entries, each of the plurality of entries comprising an initial estimate for a result of an operation; a preparatory circuit configured to search the look up table using an index value dependent upon the operand to locate an entry comprising a first initial estimate for a result of the operation; a plurality of processing circuits comprising at least one multiplier circuit; and control circuitry configured to provide the first initial estimate to the at least one multiplier circuit of the plurality of processing circuits so as perform processing, by the plurality of processing units, of the first initial estimate to generate the function result, said processing comprising applying one or more Newton Raphson iterations to the first initial estimate.
-
公开(公告)号:US11803744B2
公开(公告)日:2023-10-31
申请号:US16730459
申请日:2019-12-30
Applicant: LG Electronics Inc.
Inventor: Byoung Jo Kim , Sang Chul Kim
CPC classification number: G06N3/08 , G06F7/485 , G06F7/4876 , G06F7/5525
Abstract: Disclosed is a neural network learning apparatus for deep learning and a method thereof. A neural network learning apparatus for deep learning according to an embodiment of the present disclosure includes an input interface, a memory, and a learning processor for applying a Gradient Descent algorithm to a neural network model, and the learning processor may transform a cumulative change function of the gradient for an error function into an inverse square root function in the Gradient Descent algorithm, and operate an inverse square root approximate value by using a Newton-Raphson method for the transformed inverse square root function. The neural network learning apparatus for deep learning of the present disclosure may be connected or converged with an Artificial Intelligence module, an Unmanned Aerial Vehicle (UAV), a robot, an Augmented Reality (AR) apparatus, a Virtual Reality (VR), or a 5G network service-related apparatus, etc.
-
公开(公告)号:US20230221925A1
公开(公告)日:2023-07-13
申请号:US18150317
申请日:2023-01-05
Applicant: GSI Technology Inc.
Inventor: Eyal AMIEL , Moshe LAZER , Samuel LIFSCHES
IPC: G06F7/552
CPC classification number: G06F7/5525
Abstract: A method for calculating a square root B having N bits of a number X having 2N bits includes iterating on bits bi of square root B starting from the most significant bit until the least significant bit of square root B. For each iteration, the method includes locating a 1 at the squared location of bit bi in a CHECK variable, determining the value of bit bi from the result of a comparison of number X with a function of all previously found bits and a previous comparison outcome, shifting all previously found bits right 1 location in a CHECK variable, and adding the determined value of bit bi into its squared location in the CHECK variable.
-
公开(公告)号:US20230214186A1
公开(公告)日:2023-07-06
申请号:US18097316
申请日:2023-01-16
Applicant: Imagination Technologies Limited
Inventor: Leonard Rarick
CPC classification number: G06F7/552 , G06F7/523 , G06F7/537 , G06F7/5525 , G06F2207/5351
Abstract: In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.
-
公开(公告)号:US11599842B2
公开(公告)日:2023-03-07
申请号:US17149910
申请日:2021-01-15
Applicant: Oracle International Corporation
Inventor: Anand Kumar Singh , Manish Manish , Mohamed Fazil , Anirban Majumdar
IPC: G06Q10/0639 , G06Q10/105 , G06F40/30 , G06F40/284 , G06N3/08 , G06F7/552
Abstract: Embodiments determine mismatches in evaluations. Embodiments receive a first evaluation of an employee from a supervisor of the employee, the first evaluation including supervisor comment ratings and supervisor numerical ratings, each of the supervisor comment ratings and supervisor numerical ratings corresponding to an evaluation category. Embodiments receive a second evaluation of the employee from the employee, the second evaluation including employee comment ratings and employee numerical ratings, each of the employee comment ratings and employee numerical ratings corresponding to the evaluation category. Embodiments determine first sentiment polarity scores of the supervisor comment ratings and second sentiment polarity scores of the employee comment ratings. Embodiments determine polarity mismatch scores based on the first sentiment polarity scores and the second sentiment polarity scores and determine average differential ratings based on the supervisor numerical ratings and the employee numerical ratings. Embodiments combine the polarity mismatch scores and the average differential ratings.
-
公开(公告)号:US20230061618A1
公开(公告)日:2023-03-02
申请号:US17463374
申请日:2021-08-31
Applicant: Intel Corporation
Inventor: Menachem ADELMAN , Alexander HEINECKE , Robert VALENTINE , Zeev SPERBER , Amit GRADSTEIN , Mark CHARNEY , Evangelos GEORGANAS , Dhiraj KALAMKAR , Christopher HUGHES , Cristina ANDERSON
Abstract: Techniques for performing square root or reciprocal square root calculations on BF16 data elements in response to an instruction are described. An example of an instruction is one that includes fields for an opcode, an identification of a location of a packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operand, a calculation of a square root value of a BF16 data element in that position and store a result of each square root into a corresponding data element position of the packed data destination operand.
-
-
-
-
-
-
-
-
-