-
公开(公告)号:US20190303750A1
公开(公告)日:2019-10-03
申请号:US16443548
申请日:2019-06-17
Applicant: Intel Corporation
Inventor: Raghavan KUMAR , Gregory K. CHEN , Huseyin Ekin SUMBUL , Phil KNAG , Ram KRISHNAMURTHY
Abstract: Examples described herein relate to a neural network whose weights from a matrix are selected from a set of weights stored in a memory on-chip with a processing engine for generating multiply and carry operations. The number of weights in the set of weights stored in the memory can be less than a number of weights in the matrix thereby reducing an amount of memory used to store weights in a matrix. The weights in the memory can be generated in training using gradients from back propagation. Weights in the memory can be selected using a tabulation hash calculation on entries in a table.
-
公开(公告)号:US20230334006A1
公开(公告)日:2023-10-19
申请号:US18212079
申请日:2023-06-20
Applicant: Intel Corporation
Inventor: Huseyin Ekin SUMBUL , Gregory K. CHEN , Phil KNAG , Raghavan KUMAR , Ram KRISHNAMURTHY
CPC classification number: G06F15/8046 , G06F17/153 , G06N3/063
Abstract: A compute near memory (CNM) convolution accelerator enables a convolutional neural network (CNN) to use dedicated acceleration to achieve efficient in-place convolution operations with less impact on memory and energy consumption. A 2D convolution operation is reformulated as 1D row-wise convolution. The 1D row-wise convolution enables the CNM convolution accelerator to process input activations row-by-row, while using the weights one-by-one. Lightweight access circuits provide the ability to stream both weights and input rows as vectors to MAC units, which in turn enables modules of the CNM convolution accelerator to implement convolution for both [1×1] and chosen [n×n] sized filters.
-
公开(公告)号:US20230297819A1
公开(公告)日:2023-09-21
申请号:US18201291
申请日:2023-05-24
Applicant: Intel Corporation
Inventor: Ram KRISHNAMURTHY , Gregory K. CHEN , Raghavan KUMAR , Phil KNAG , Huseyin Ekin SUMBUL , Deepak Vinayak KADETOTAD
CPC classification number: G06N3/063 , G06F17/16 , G06N3/04 , G06F7/5443
Abstract: An apparatus is described. The apparatus includes a circuit to process a binary neural network. The circuit includes an array of processing cores, wherein, processing cores of the array of processing cores are to process different respective areas of a weight matrix of the binary neural network. The processing cores each include add circuitry to add only those weights of an i layer of the binary neural network that are to be effectively multiplied by a non zero nodal output of an i−1 layer of the binary neural network.
-
公开(公告)号:US20200034148A1
公开(公告)日:2020-01-30
申请号:US16586975
申请日:2019-09-28
Applicant: Intel Corporation
Inventor: Huseyin Ekin SUMBUL , Gregory K. CHEN , Phil KNAG , Raghavan KUMAR , Ram KRISHNAMURTHY
Abstract: A compute near memory (CNM) convolution accelerator enables a convolutional neural network (CNN) to use dedicated acceleration to achieve efficient in-place convolution operations with less impact on memory and energy consumption. A 2D convolution operation is reformulated as 1D row-wise convolution. The 1D row-wise convolution enables the CNM convolution accelerator to process input activations row-by-row, while using the weights one-by-one. Lightweight access circuits provide the ability to stream both weights and input row as vectors to MAC units, which in turn enables modules of the CNM convolution accelerator to implement convolution for both [1×1] and chosen [n×n] sized filters.
-
公开(公告)号:US20190065151A1
公开(公告)日:2019-02-28
申请号:US16145569
申请日:2018-09-28
Applicant: Intel Corporation
Inventor: Gregory K. CHEN , Raghavan KUMAR , Huseyin Ekin SUMBUL , Phil KNAG , Ram KRISHNAMURTHY , Sasikanth MANIPATRUNI , Amrita MATHURIYA , Abhishek SHARMA , Ian A. YOUNG
Abstract: A memory device that includes a plurality subarrays of memory cells to store static weights and a plurality of digital full-adder circuits between subarrays of memory cells is provided. The digital full-adder circuit in the memory device eliminates the need to move data from a memory device to a processor to perform machine learning calculations. Rows of full-adder circuits are distributed between sub-arrays of memory cells to increase the effective memory bandwidth and reduce the time to perform matrix-vector multiplications in the memory device by performing bit-serial dot-product primitives in the form of accumulating m 1-bit x n-bit multiplications.
-
6.
公开(公告)号:US20190042949A1
公开(公告)日:2019-02-07
申请号:US16147143
申请日:2018-09-28
Applicant: Intel Corporation
Inventor: Ian A. YOUNG , Ram KRISHNAMURTHY , Sasikanth MANIPATRUNI , Gregory K. CHEN , Amrita MATHURIYA , Abhishek SHARMA , Raghavan KUMAR , Phil KNAG , Huseyin Ekin SUMBUL
Abstract: A semiconductor chip is described. The semiconductor chip includes a compute-in-memory (CIM) circuit to implement a neural network in hardware. The semiconductor chip also includes at least one output that presents samples of voltages generated at a node of the CIM circuit in response to a range of neural network input values applied to the CIM circuit to optimize the CIM circuit for the neural network.
-
公开(公告)号:US20220107867A1
公开(公告)日:2022-04-07
申请号:US17553623
申请日:2021-12-16
Applicant: Intel Corporation
Inventor: Wei WU , Carlos TOKUNAGA , Gregory K. CHEN
IPC: G06F11/10
Abstract: A near memory compute system includes multiple computation nodes, such as nodes for parallel distributed processing. The nodes include a memory device to store data and compute hardware to perform a computation on the data. Error correction code (ECC) logic performs ECC on the data prior to computation on the data by the compute hardware. The node also includes residue check logic to perform a residue check on a result of the computation.
-
公开(公告)号:US20200233923A1
公开(公告)日:2020-07-23
申请号:US16839013
申请日:2020-04-02
Applicant: Intel Corporation
Inventor: Phil KNAG , Gregory K. CHEN , Raghavan KUMAR , Huseyin Ekin SUMBUL , Abhishek SHARMA , Sasikanth MANIPATRUNI , Amrita MATHURIYA , Ram KRISHNAMURTHY , Ian A. YOUNG
IPC: G06F17/16 , G06N3/063 , G11C8/08 , G11C7/12 , G11C7/10 , G06F9/30 , G11C11/56 , G11C11/418 , G11C11/419
Abstract: A binary CIM circuit enables all memory cells in a memory array to be effectively accessible simultaneously for computation using fixed pulse widths on the wordlines and equal capacitance on the bitlines. The fixed pulse widths and equal capacitance ensure that a minimum voltage drop in the bitline represents one least significant bit (LSB) so that the bitline voltage swing remains safely within the maximum allowable range. The binary CIM circuit maximizes the effective memory bandwidth of a memory array for a given maximum voltage range of bitline voltage.
-
9.
公开(公告)号:US20200026498A1
公开(公告)日:2020-01-23
申请号:US16586648
申请日:2019-09-27
Applicant: Intel Corporation
Inventor: Huseyin Ekin SUMBUL , Gregory K. CHEN , Phil KNAG , Raghavan KUMAR , Ram KRISHNAMURTHY
Abstract: A memory circuit includes a number (X) of multiply-accumulate (MAC) circuits that are dynamically configurable. The MAC circuits can either compute an output based on computations of X elements of the input vector with the weight vector, or to compute the output based on computations of a single element of the input vector with the weight vector, with each element having a one bit or multibit length. A first memory can hold the input vector having a width of X elements and a second memory can store the weight vector. The MAC circuits include a MAC array on chip with the first memory.
-
公开(公告)号:US20200019847A1
公开(公告)日:2020-01-16
申请号:US16583217
申请日:2019-09-25
Applicant: Intel Corporation
Inventor: Ram KRISHNAMURTHY , Gregory K. CHEN , Raghavan KUMAR , Phil KNAG , Huseyin Ekin SUMBUL , Deepak Vinayak KADETOTAD
Abstract: An apparatus is described. The apparatus includes a circuit to process a binary neural network. The circuit includes an array of processing cores, wherein, processing cores of the array of processing cores are to process different respective areas of a weight matrix of the binary neural network. The processing cores each include add circuitry to add only those weights of an i layer of the binary neural network that are to be effectively multiplied by a non zero nodal output of an i−1 layer of the binary neural network.
-
-
-
-
-
-
-
-
-