-
公开(公告)号:US11640537B2
公开(公告)日:2023-05-02
申请号:US16378107
申请日:2019-04-08
Applicant: Intel Corporation
Inventor: Bharat Daga , Krishnakumar Nair , Pradeep Janedula , Aravind Babu Srinivasan , Bijoy Pazhanimala , Ambili Vengallur
IPC: G06N3/10
Abstract: An apparatus to facilitate execution of non-linear functions operations is disclosed. The apparatus comprises accelerator circuitry including a compute grid having a plurality of processing elements to execute neural network computations, store values resulting from the neural network computations, and perform piecewise linear (PWL) approximations of one or more non-linear functions using the stored values as input data.
-
公开(公告)号:US12254061B2
公开(公告)日:2025-03-18
申请号:US17256195
申请日:2018-09-27
Applicant: Intel Corporation
Inventor: Maciej Urbanski , Brian J. Hickmann , Michael Rotzin , Krishnakumar Nair , Andrew Yang , Brian S. Morris , Dennis Bradford
Abstract: Methods and apparatuses relating to performing vector multiplication are described. Hardware accelerators to perform vector multiplication are also described. In one embodiment, a combined fixed-point and floating-point vector multiplication circuit includes at least one switch to change the circuit between a first mode and a second mode, where in the first mode, each multiplier of a set of multipliers is to multiply mantissas from a same element position of a first floating-point vector and a second floating-point vector to produce a corresponding product, shift the corresponding products with a set of shift registers based on a maximum exponent of exponents for the corresponding products determined by a maximum exponent determiner to produce shifted products, perform an numeric conversion operation on the shifted products with a set of numeric conversion circuits based on sign bits from the same element position of the first floating-point vector and the second floating-point vector to produce signed representations of the shifted products, add the signed representations of the shifted products with a set of adders to produce a single product, and normalize the single product with a normalization circuit based on the maximum exponent into a single floating-point resultant, and in the second mode, each multiplier of the set of multipliers is to multiply values from a same element position of a first integer vector and a second integer vector to produce a corresponding product, and add each corresponding product with the set of adders to produce a single integer resultant.
-
公开(公告)号:US20200320403A1
公开(公告)日:2020-10-08
申请号:US16378107
申请日:2019-04-08
Applicant: Intel Corporation
Inventor: Bharat Daga , Krishnakumar Nair , Pradeep Janedula , Aravind Babu Srinivasan , Bijoy Pazhanimala , Ambili Vengallur
IPC: G06N3/10
Abstract: An apparatus to facilitate execution of non-linear functions operations is disclosed. The apparatus comprises accelerator circuitry including a compute grid having a plurality of processing elements to execute neural network computations, store values resulting from the neural network computations, and perform piecewise linear (PWL) approximations of one or more non-linear functions using the stored values as input data.
-
公开(公告)号:US20190042944A1
公开(公告)日:2019-02-07
申请号:US16004243
申请日:2018-06-08
Applicant: Intel Corporation
Inventor: Krishnakumar Nair , Andrew Yang , Brian Morris
Abstract: Thus, the present disclosure is directed to systems and methods for training neural networks using a tensor that includes a plurality of FP16 values and a plurality of bits that define an exponent shared by some or all of the FP16 values included in the tensor. The FP16 values may include IEEE 754 format 16-bit floating point values and the tensor may include a plurality of bits defining the shared exponent. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa and a variable bit-length exponent that may be dynamically set by processor circuitry. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa; a variable bit-length exponent that may be dynamically set by processor circuitry; and a shared exponent switch set by the processor circuitry to selectively combine the FP16 value exponent with the shared exponent.
-
5.
公开(公告)号:US20190042094A1
公开(公告)日:2019-02-07
申请号:US16024812
申请日:2018-06-30
Applicant: INTEL CORPORATION
Inventor: Krishnakumar Nair , Andrew Yang , Michael Rotzn , Nitin Garegrat , Tom Schebye , Tony Werner
Abstract: An apparatus and method for a converting tensor data. For example, one embodiment of a method comprises: fetching source tensor blocks of a source tensor data structure, each source tensor block comprising a plurality of source tensor data elements having a first numeric representation, wherein the source tensor data structure comprises a predefined structural arrangement of source tensor blocks; converting the one or more source tensor blocks into one or more destination tensor blocks comprising a plurality of destination tensor data elements having a second numeric representation different from the first numeric representation, wherein the sets of one or more source tensor blocks are converted to one or more corresponding destination tensor blocks in a specified order based on the first and second numeric representations; and storing each individual destination tensor block in a designated memory region to maintain coherency with the predefined structural arrangement of the source tensor blocks.
-
公开(公告)号:US12205035B2
公开(公告)日:2025-01-21
申请号:US16004243
申请日:2018-06-08
Applicant: Intel Corporation
Inventor: Krishnakumar Nair , Andrew Yang , Brian Morris
Abstract: Thus, the present disclosure is directed to systems and methods for training neural networks using a tensor that includes a plurality of FP16 values and a plurality of bits that define an exponent shared by some or all of the FP16 values included in the tensor. The FP16 values may include IEEE 754 format 16-bit floating point values and the tensor may include a plurality of bits defining the shared exponent. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa and a variable bit-length exponent that may be dynamically set by processor circuitry. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa; a variable bit-length exponent that may be dynamically set by processor circuitry; and a shared exponent switch set by the processor circuitry to selectively combine the FP16 value exponent with the shared exponent.
-
公开(公告)号:US20240028905A1
公开(公告)日:2024-01-25
申请号:US18478554
申请日:2023-09-29
Applicant: Intel Corporation
Inventor: Krishnakumar Nair , Andrew Yang , Brian Morris
CPC classification number: G06N3/084 , G06N3/063 , G06N3/045 , G06F9/3013
Abstract: Thus, the present disclosure is directed to systems and methods for training neural networks using a tensor that includes a plurality of FP16 values and a plurality of bits that define an exponent shared by some or all of the FP16 values included in the tensor. The FP16 values may include IEEE 754 format 16-bit floating point values and the tensor may include a plurality of bits defining the shared exponent. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa and a variable bit-length exponent that may be dynamically set by processor circuitry. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa; a variable bit-length exponent that may be dynamically set by processor circuitry; and a shared exponent switch set by the processor circuitry to selectively combine the FP16 value exponent with the shared exponent.
-
公开(公告)号:US10761757B2
公开(公告)日:2020-09-01
申请号:US16024812
申请日:2018-06-30
Applicant: INTEL CORPORATION
Inventor: Krishnakumar Nair , Andrew Yang , Michael Rotzin , Nitin Garegrat , Tom Schebye , Tony Werner
Abstract: An apparatus and method for a converting tensor data. For example, one embodiment of a method comprises: fetching source tensor blocks of a source tensor data structure, each source tensor block comprising a plurality of source tensor data elements having a first numeric representation, wherein the source tensor data structure comprises a predefined structural arrangement of source tensor blocks; converting the one or more source tensor blocks into one or more destination tensor blocks comprising a plurality of destination tensor data elements having a second numeric representation different from the first numeric representation, wherein the sets of one or more source tensor blocks are converted to one or more corresponding destination tensor blocks in a specified order based on the first and second numeric representations; and storing each individual destination tensor block in a designated memory region to maintain coherency with the predefined structural arrangement of the source tensor blocks.
-
-
-
-
-
-
-