-
公开(公告)号:US11061646B2
公开(公告)日:2021-07-13
申请号:US16147004
申请日:2018-09-28
Applicant: Intel Corporation
Inventor: Huseyin Ekin Sumbul , Phil Knag , Gregory K. Chen , Raghavan Kumar , Abhishek Sharma , Sasikanth Manipatruni , Amrita Mathuriya , Ram Krishnamurthy , Ian A. Young
IPC: G06F7/544 , G11C8/10 , G11C8/08 , G11C7/12 , G11C11/4094 , G11C7/10 , G11C11/56 , G11C11/4091 , G06G7/16 , G11C11/419
Abstract: Compute-in memory circuits and techniques are described. In one example, a memory device includes an array of memory cells, the array including multiple sub-arrays. Each of the sub-arrays receives a different voltage. The memory device also includes capacitors coupled with conductive access lines of each of the multiple sub-arrays and circuitry coupled with the capacitors, to share charge between the capacitors in response to a signal. In one example, computing device, such as a machine learning accelerator, includes a first memory array and a second memory array. The computing device also includes an analog processor circuit coupled with the first and second memory arrays to receive first analog input voltages from the first memory array and second analog input voltages from the second memory array and perform one or more operations on the first and second analog input voltages, and output an analog output voltage.
-
12.
公开(公告)号:US20200242459A1
公开(公告)日:2020-07-30
申请号:US16262583
申请日:2019-01-30
Applicant: Intel Corporation
Inventor: Sasikanth Manipatruni , Ram Krishnamurthy , Amrita Mathuriya , Dmitri Nikonov , Ian Young
Abstract: Techniques are provided for implementing a hybrid processing architecture comprising a general-purpose processor (CPU) and a neural processing unit (NPU), coupled to an analog in-memory artificial intelligence (AI) processor. According to an embodiment, the hybrid processor implements an AI instruction set including instructions to perform analog in-memory computations. The AI processor comprises one or more layers, the NN layers including memory circuitry and analog processing circuitry. The memory circuitry is configured to store the weighting factors and the input data. The analog processing circuitry is configured to perform analog calculations on the stored weighting factors and the stored input data in accordance with the execution, by the NPU, of instruction from the AI instruction set. The AI instruction set includes instructions to perform dot products, multiplication, differencing, normalization, pooling, thresholding, transposition, and backpropagation training. The NN layers are configured as convolutional NN layers and/or fully connected NN layers.
-
公开(公告)号:US10705967B2
公开(公告)日:2020-07-07
申请号:US16160270
申请日:2018-10-15
Applicant: Intel Corporation
Inventor: Amrita Mathuriya , Sasikanth Manipatruni , Victor Lee , Huseyin Sumbul , Gregory Chen , Raghavan Kumar , Phil Knag , Ram Krishnamurthy , Ian Young , Abhishek Sharma
Abstract: The present disclosure is directed to systems and methods of implementing a neural network using in-memory mathematical operations performed by pipelined SRAM architecture (PISA) circuitry disposed in on-chip processor memory circuitry. A high-level compiler may be provided to compile data representative of a multi-layer neural network model and one or more neural network data inputs from a first high-level programming language to an intermediate domain-specific language (DSL). A low-level compiler may be provided to compile the representative data from the intermediate DSL to multiple instruction sets in accordance with an instruction set architecture (ISA), such that each of the multiple instruction sets corresponds to a single respective layer of the multi-layer neural network model. Each of the multiple instruction sets may be assigned to a respective SRAM array of the PISA circuitry for in-memory execution. Thus, the systems and methods described herein beneficially leverage the on-chip processor memory circuitry to perform a relatively large number of in-memory vector/tensor calculations in furtherance of neural network processing without burdening the processor circuitry.
-
公开(公告)号:US11727260B2
公开(公告)日:2023-08-15
申请号:US17484828
申请日:2021-09-24
Applicant: Intel Corporation
Inventor: Abhishek Sharma , Jack T. Kavalieros , Ian A. Young , Ram Krishnamurthy , Sasikanth Manipatruni , Uygar Avci , Gregory K. Chen , Amrita Mathuriya , Raghavan Kumar , Phil Knag , Huseyin Ekin Sumbul , Nazila Haratipour , Van H. Le
IPC: G06N3/063 , H01L27/108 , H01L27/11502 , G06N3/04 , G06F17/16 , H01L27/11 , G11C11/54 , G11C7/10 , G11C11/419 , G11C11/409 , G11C11/22 , G06N3/065 , H10B10/00 , H10B12/00 , H10B53/00
CPC classification number: G06N3/065 , G06F17/16 , G06N3/04 , G11C7/1006 , G11C7/1039 , G11C11/54 , H10B10/18 , H10B12/01 , H10B12/033 , H10B12/20 , H10B12/50 , H10B53/00 , G11C11/221 , G11C11/409 , G11C11/419
Abstract: An apparatus is described. The apparatus includes a compute-in-memory (CIM) circuit for implementing a neural network disposed on a semiconductor chip. The CIM circuit includes a mathematical computation circuit coupled to a memory array. The memory array includes an embedded dynamic random access memory (eDRAM) memory array. Another apparatus is described. The apparatus includes a compute-in-memory (CIM) circuit for implementing a neural network disposed on a semiconductor chip. The CIM circuit includes a mathematical computation circuit coupled to a memory array. The mathematical computation circuit includes a switched capacitor circuit. The switched capacitor circuit includes a back-end-of-line (BEOL) capacitor coupled to a thin film transistor within the metal/dielectric layers of the semiconductor chip. Another apparatus is described. The apparatus includes a compute-in-memory (CIM) circuit for implementing a neural network disposed on a semiconductor chip. The CIM circuit includes a mathematical computation circuit coupled to a memory array. The mathematical computation circuit includes an accumulation circuit. The accumulation circuit includes a ferroelectric BEOL capacitor to store a value to be accumulated with other values stored by other ferroelectric BEOL capacitors.
-
公开(公告)号:US11416165B2
公开(公告)日:2022-08-16
申请号:US16160482
申请日:2018-10-15
Applicant: INTEL CORPORATION
Inventor: Amrita Mathuriya , Sasikanth Manipatruni , Victor Lee , Huseyin Sumbul , Gregory Chen , Raghavan Kumar , Phil Knag , Ram Krishnamurthy , Ian Young , Abhishek Sharma
IPC: G06F12/00 , G06F3/06 , G06F12/1081 , G06N3/04 , G06F12/0802 , G06N3/063 , G06F12/0875 , G06F12/0897
Abstract: The present disclosure is directed to systems and methods of implementing a neural network using in-memory, bit-serial, mathematical operations performed by a pipelined SRAM architecture (bit-serial PISA) circuitry disposed in on-chip processor memory circuitry. The on-chip processor memory circuitry may include processor last level cache (LLC) circuitry. The bit-serial PISA circuitry is coupled to PISA memory circuitry via a relatively high-bandwidth connection to beneficially facilitate the storage and retrieval of layer weights by the bit-serial PISA circuitry during execution. Direct memory access (DMA) circuitry transfers the neural network model and input data from system memory to the bit-serial PISA memory and also transfers output data from the PISA memory circuitry to system memory circuitry. Thus, the systems and methods described herein beneficially leverage the on-chip processor memory circuitry to perform a relatively large number of vector/tensor calculations without burdening the processor circuitry.
-
公开(公告)号:US10831446B2
公开(公告)日:2020-11-10
申请号:US16145569
申请日:2018-09-28
Applicant: Intel Corporation
Inventor: Gregory K. Chen , Raghavan Kumar , Huseyin Ekin Sumbul , Phil Knag , Ram Krishnamurthy , Sasikanth Manipatruni , Amrita Mathuriya , Abhishek Sharma , Ian A. Young
Abstract: A memory device that includes a plurality subarrays of memory cells to store static weights and a plurality of digital full-adder circuits between subarrays of memory cells is provided. The digital full-adder circuit in the memory device eliminates the need to move data from a memory device to a processor to perform machine learning calculations. Rows of full-adder circuits are distributed between sub-arrays of memory cells to increase the effective memory bandwidth and reduce the time to perform matrix-vector multiplications in the memory device by performing bit-serial dot-product primitives in the form of accumulating m 1-bit×n-bit multiplications.
-
公开(公告)号:US20190043560A1
公开(公告)日:2019-02-07
申请号:US16146473
申请日:2018-09-28
Applicant: Intel Corporation
Inventor: Huseyin Ekin Sumbul , Gregory K. Chen , Raghavan Kumar , Phil Knag , Abhishek Sharma , Sasikanth Manipatruni , Amrita Mathuriya , Ram Krishnamurthy , Ian A. Young
IPC: G11C11/418 , G06F7/544 , G06F9/30 , G11C13/00 , G11C11/419
Abstract: A memory circuit has compute-in-memory circuitry that enables a multiply-accumulate (MAC) operation based on shared charge. Row access circuitry drives multiple rows of a memory array to multiply a first data word with a second data word stored in the memory array. The row access circuitry drives the multiple rows based on the bit pattern of the first data word. Column access circuitry drives a column of the memory array when the rows are driven. Accessed rows discharge the column line in an accumulative fashion. Sensing circuitry can sense voltage on the column line. A processor in the memory circuit computes a MAC value based on the voltage sensed on the column.
-
公开(公告)号:US11347994B2
公开(公告)日:2022-05-31
申请号:US16160466
申请日:2018-10-15
Applicant: Intel Corporation
Inventor: Amrita Mathuriya , Sasikanth Manipatruni , Victor Lee , Huseyin Sumbul , Gregory Chen , Raghavan Kumar , Phil Knag , Ram Krishnamurthy , Ian Young , Abhishek Sharma
IPC: G06F3/06 , G06N3/04 , G06F12/0875 , G06N3/063
Abstract: The present disclosure is directed to systems and methods of bit-serial, in-memory, execution of at least an nth layer of a multi-layer neural network in a first on-chip processor memory circuitry portion contemporaneous with prefetching and storing layer weights associated with the (n+1)st layer of the multi-layer neural network in a second on-chip processor memory circuitry portion. The storage of layer weights in on-chip processor memory circuitry beneficially decreases the time required to transfer the layer weights upon execution of the (n+1)st layer of the multi-layer neural network by the first on-chip processor memory circuitry portion. In addition, the on-chip processor memory circuitry may include a third on-chip processor memory circuitry portion used to store intermediate and/or final input/output values associated with one or more layers included in the multi-layer neural network.
-
公开(公告)号:US11294985B2
公开(公告)日:2022-04-05
申请号:US16175229
申请日:2018-10-30
Applicant: INTEL CORPORATION
Inventor: Amrita Mathuriya , Sasikanth Manipatruni , Dmitri Nikonov , Ian Young , Ram Krishnamurthy
Abstract: Techniques are provided for efficient matrix multiplication using in-memory analog parallel processing, with applications for neural networks and artificial intelligence processors. A methodology implementing the techniques according to an embodiment includes storing two matrices in-memory. The first matrix is stored in transposed form such that the transposed first matrix has the same number of rows as the second matrix. The method further includes reading columns of the matrices from the memory in parallel, using disclosed bit line functional read operations and cross bit line functional read operations, which are employed to generate analog dot products between the columns. Each of the dot products corresponds to an element of the matrix multiplication product of the two matrices. In some embodiments, one of the matrices may be used to store neural network weighting factors, and the other matrix may be used to store input data to be processed by the neural network.
-
公开(公告)号:US11016701B2
公开(公告)日:2021-05-25
申请号:US16146878
申请日:2018-09-28
Applicant: Intel Corporation
Inventor: Ian Young , Ram Krishnamurthy , Sasikanth Manipatruni , Amrita Mathuriya , Abhishek Sharma , Raghavan Kumar , Phil Knag , Huseyin Sumbul , Gregory Chen
Abstract: Techniques and mechanisms for a memory device to perform in-memory computing based on a logic state which is detected with a voltage-controlled oscillator (VCO). In an embodiment, a VCO circuit of the memory device receives from a memory array a first signal indicating a logic state that is based on one or more currently stored data bits. The VCO provides a conversion from the logic state being indicated by a voltage characteristic of the first signal to the logic state being indicated by a corresponding frequency characteristic of a cyclical signal. Based on the frequency characteristic, the logic state is identified and communicated for use in an in-memory computation at the memory device. In another embodiment, a result of the in-memory computation is written back to the memory array.
-
-
-
-
-
-
-
-
-