-
公开(公告)号:US10705967B2
公开(公告)日:2020-07-07
申请号:US16160270
申请日:2018-10-15
Applicant: Intel Corporation
Inventor: Amrita Mathuriya , Sasikanth Manipatruni , Victor Lee , Huseyin Sumbul , Gregory Chen , Raghavan Kumar , Phil Knag , Ram Krishnamurthy , Ian Young , Abhishek Sharma
Abstract: The present disclosure is directed to systems and methods of implementing a neural network using in-memory mathematical operations performed by pipelined SRAM architecture (PISA) circuitry disposed in on-chip processor memory circuitry. A high-level compiler may be provided to compile data representative of a multi-layer neural network model and one or more neural network data inputs from a first high-level programming language to an intermediate domain-specific language (DSL). A low-level compiler may be provided to compile the representative data from the intermediate DSL to multiple instruction sets in accordance with an instruction set architecture (ISA), such that each of the multiple instruction sets corresponds to a single respective layer of the multi-layer neural network model. Each of the multiple instruction sets may be assigned to a respective SRAM array of the PISA circuitry for in-memory execution. Thus, the systems and methods described herein beneficially leverage the on-chip processor memory circuitry to perform a relatively large number of in-memory vector/tensor calculations in furtherance of neural network processing without burdening the processor circuitry.
-
公开(公告)号:US20190187208A1
公开(公告)日:2019-06-20
申请号:US15846047
申请日:2017-12-18
Applicant: Intel Corporation
Inventor: Amit Agarwal , Ram Krishnamurthy , Satish Damaraju , Steven Hsu , Simeon Realov
IPC: G01R31/317 , H03K3/037 , G01R31/3177
Abstract: An apparatus is provided which comprises: a multi-bit quad latch with an internally coupled level sensitive scan circuitry; and a combinational logic coupled to an output of the multi-bit quad latch. Another apparatus is provided which comprises: a plurality of sequential logic circuitries; and a clocking circuitry comprising inverters, wherein the clocking circuitry is shared by the plurality of sequential logic circuitries.
-
公开(公告)号:US12243148B2
公开(公告)日:2025-03-04
申请号:US17070095
申请日:2020-10-14
Applicant: Intel Corporation
Inventor: Vivek De , Ram Krishnamurthy , Amit Agarwal , Steven Hsu , Monodeep Kar
Abstract: A method comprising: dividing a 3D space into a voxel grid comprising a plurality of voxels; associating a plurality of distance values with the plurality of voxels, each distance value based on a distance to a boundary of an object; selecting an approximate interpolation mode for stepping a ray through a first one or more voxels of the 3D space responsive to the first one or more voxels having distance values greater than a threshold; and detecting the ray reaching a second one or more voxels having distance values less than the first threshold; and responsively selecting a precise interpolation mode for stepping the ray through the second one or more voxels.
-
公开(公告)号:US11812599B2
公开(公告)日:2023-11-07
申请号:US17670248
申请日:2022-02-11
Applicant: Intel Corporation
Inventor: Abhishek Sharma , Noriyuki Sato , Sarah Atanasov , Huseyin Ekin Sumbul , Gregory K. Chen , Phil Knag , Ram Krishnamurthy , Hui Jae Yoo , Van H. Le
IPC: G11C8/00 , H10B12/00 , H01L27/12 , G11C11/4096
CPC classification number: H10B12/00 , G11C11/4096 , H01L27/124 , H01L27/1207 , H01L27/1225 , H01L27/1255 , H01L27/1266
Abstract: Examples herein relate to a memory device comprising an eDRAM memory cell, the eDRAM memory cell can include a write circuit formed at least partially over a storage cell and a read circuit formed at least partially under the storage cell; a compute near memory device bonded to the memory device; a processor; and an interface from the memory device to the processor. In some examples, circuitry is included to provide an output of the memory device to emulate output read rate of an SRAM memory device comprises one or more of: a controller, a multiplexer, or a register. Bonding of a surface of the memory device can be made to a compute near memory device or other circuitry. In some examples, a layer with read circuitry can be bonded to a layer with storage cells. Any layers can be bonded together using techniques described herein.
-
公开(公告)号:US11727260B2
公开(公告)日:2023-08-15
申请号:US17484828
申请日:2021-09-24
Applicant: Intel Corporation
Inventor: Abhishek Sharma , Jack T. Kavalieros , Ian A. Young , Ram Krishnamurthy , Sasikanth Manipatruni , Uygar Avci , Gregory K. Chen , Amrita Mathuriya , Raghavan Kumar , Phil Knag , Huseyin Ekin Sumbul , Nazila Haratipour , Van H. Le
IPC: G06N3/063 , H01L27/108 , H01L27/11502 , G06N3/04 , G06F17/16 , H01L27/11 , G11C11/54 , G11C7/10 , G11C11/419 , G11C11/409 , G11C11/22 , G06N3/065 , H10B10/00 , H10B12/00 , H10B53/00
CPC classification number: G06N3/065 , G06F17/16 , G06N3/04 , G11C7/1006 , G11C7/1039 , G11C11/54 , H10B10/18 , H10B12/01 , H10B12/033 , H10B12/20 , H10B12/50 , H10B53/00 , G11C11/221 , G11C11/409 , G11C11/419
Abstract: An apparatus is described. The apparatus includes a compute-in-memory (CIM) circuit for implementing a neural network disposed on a semiconductor chip. The CIM circuit includes a mathematical computation circuit coupled to a memory array. The memory array includes an embedded dynamic random access memory (eDRAM) memory array. Another apparatus is described. The apparatus includes a compute-in-memory (CIM) circuit for implementing a neural network disposed on a semiconductor chip. The CIM circuit includes a mathematical computation circuit coupled to a memory array. The mathematical computation circuit includes a switched capacitor circuit. The switched capacitor circuit includes a back-end-of-line (BEOL) capacitor coupled to a thin film transistor within the metal/dielectric layers of the semiconductor chip. Another apparatus is described. The apparatus includes a compute-in-memory (CIM) circuit for implementing a neural network disposed on a semiconductor chip. The CIM circuit includes a mathematical computation circuit coupled to a memory array. The mathematical computation circuit includes an accumulation circuit. The accumulation circuit includes a ferroelectric BEOL capacitor to store a value to be accumulated with other values stored by other ferroelectric BEOL capacitors.
-
公开(公告)号:US11416165B2
公开(公告)日:2022-08-16
申请号:US16160482
申请日:2018-10-15
Applicant: INTEL CORPORATION
Inventor: Amrita Mathuriya , Sasikanth Manipatruni , Victor Lee , Huseyin Sumbul , Gregory Chen , Raghavan Kumar , Phil Knag , Ram Krishnamurthy , Ian Young , Abhishek Sharma
IPC: G06F12/00 , G06F3/06 , G06F12/1081 , G06N3/04 , G06F12/0802 , G06N3/063 , G06F12/0875 , G06F12/0897
Abstract: The present disclosure is directed to systems and methods of implementing a neural network using in-memory, bit-serial, mathematical operations performed by a pipelined SRAM architecture (bit-serial PISA) circuitry disposed in on-chip processor memory circuitry. The on-chip processor memory circuitry may include processor last level cache (LLC) circuitry. The bit-serial PISA circuitry is coupled to PISA memory circuitry via a relatively high-bandwidth connection to beneficially facilitate the storage and retrieval of layer weights by the bit-serial PISA circuitry during execution. Direct memory access (DMA) circuitry transfers the neural network model and input data from system memory to the bit-serial PISA memory and also transfers output data from the PISA memory circuitry to system memory circuitry. Thus, the systems and methods described herein beneficially leverage the on-chip processor memory circuitry to perform a relatively large number of vector/tensor calculations without burdening the processor circuitry.
-
公开(公告)号:US20210397414A1
公开(公告)日:2021-12-23
申请号:US17358868
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Arnab Raha , Mark A. Anders , Martin Power , Martin Langhammer , Himanshu Kaul , Debabrata Mohapatra , Gautham Chinya , Cormac Brick , Ram Krishnamurthy
Abstract: Systems, apparatuses and methods may provide for multi-precision multiply-accumulate (MAC) technology that includes a plurality of arithmetic blocks, wherein the plurality of arithmetic blocks each contain multiple multipliers, and wherein the logic is to combine multipliers one or more of within each arithmetic block or across multiple arithmetic blocks. In one example, one or more intermediate multipliers are of a size that is less than precisions supported by arithmetic blocks containing the one or more intermediate multipliers.
-
公开(公告)号:US11157799B2
公开(公告)日:2021-10-26
申请号:US16299014
申请日:2019-03-11
Applicant: Intel Corporation
Inventor: Huseyin E. Sumbul , Gregory K. Chen , Raghavan Kumar , Phil Christopher Knag , Ram Krishnamurthy
Abstract: A neuromorphic computing system is provided which comprises: a synapse core; and a pre-synaptic neuron, a first post-synaptic neuron, and a second post-synaptic neuron coupled to the synaptic core, wherein the synapse core is to: receive a request from the pre-synaptic neuron, generate, in response to the request, a first address of the first post-synaptic neuron and a second address of the second post-synaptic neuron, wherein the first address and the second address are not stored in the synapse core prior to receiving the request.
-
公开(公告)号:US11054470B1
公开(公告)日:2021-07-06
申请号:US16725689
申请日:2019-12-23
Applicant: Intel Corporation
Inventor: Amit Agarwal , Steven Hsu , Anupama Ambardar Thaploo , Simeon Realov , Ram Krishnamurthy
IPC: G01R31/3185 , G01R31/317 , H03K3/037 , G01R31/3177 , H03K3/038
Abstract: A family of novel, low power, min-drive strength, double-edge triggered (DET) input data multiplexer (Mux-D) scan flip-flop (FF) is provided. The flip-flop takes the advantage of no state node in the slave to remove data inverters in a traditional DET FF to save power, without affecting the flip-flop functionality under coupling/glitch scenarios.
-
公开(公告)号:US20210203323A1
公开(公告)日:2021-07-01
申请号:US16727742
申请日:2019-12-26
Applicant: Intel Corporation
Inventor: Steven Hsu , Amit Agarwal , Simeon Realov , Ram Krishnamurthy
Abstract: A parasitic-aware single-edge triggered flip-flop reduces clock power through layout optimization, enabled through process-circuit co-optimization. The static pass-gate master-slave flip-flop utilizes novel layout optimization enabling significant power reduction. The layout removes the clock poly over notches in the diffusion area. Poly lines implement clock nodes. The poly lines are aligned between n-type and p-type active regions.
-
-
-
-
-
-
-
-
-