-
公开(公告)号:US11791819B2
公开(公告)日:2023-10-17
申请号:US16727742
申请日:2019-12-26
Applicant: Intel Corporation
Inventor: Steven Hsu , Amit Agarwal , Simeon Realov , Ram Krishnamurthy
CPC classification number: H03K19/0016 , G11C7/222 , H03K3/012 , H03K3/0372 , H03K5/135 , H03K19/0013
Abstract: A parasitic-aware single-edge triggered flip-flop reduces clock power through layout optimization, enabled through process-circuit co-optimization. The static pass-gate master-slave flip-flop utilizes novel layout optimization enabling significant power reduction. The layout removes the clock poly over notches in the diffusion area. Poly lines implement clock nodes. The poly lines are aligned between n-type and p-type active regions.
-
公开(公告)号:US11726950B2
公开(公告)日:2023-08-15
申请号:US16586975
申请日:2019-09-28
Applicant: Intel Corporation
Inventor: Huseyin Ekin Sumbul , Gregory K. Chen , Phil Knag , Raghavan Kumar , Ram Krishnamurthy
CPC classification number: G06F15/8046 , G06F17/153 , G06N3/063
Abstract: A compute near memory (CNM) convolution accelerator enables a convolutional neural network (CNN) to use dedicated acceleration to achieve efficient in-place convolution operations with less impact on memory and energy consumption. A 2D convolution operation is reformulated as 1D row-wise convolution. The 1D row-wise convolution enables the CNM convolution accelerator to process input activations row-by-row, while using the weights one-by-one. Lightweight access circuits provide the ability to stream both weights and input rows as vectors to MAC units, which in turn enables modules of the CNM convolution accelerator to implement convolution for both [1×1] and chosen [n×n] sized filters.
-
公开(公告)号:US11347994B2
公开(公告)日:2022-05-31
申请号:US16160466
申请日:2018-10-15
Applicant: Intel Corporation
Inventor: Amrita Mathuriya , Sasikanth Manipatruni , Victor Lee , Huseyin Sumbul , Gregory Chen , Raghavan Kumar , Phil Knag , Ram Krishnamurthy , Ian Young , Abhishek Sharma
IPC: G06F3/06 , G06N3/04 , G06F12/0875 , G06N3/063
Abstract: The present disclosure is directed to systems and methods of bit-serial, in-memory, execution of at least an nth layer of a multi-layer neural network in a first on-chip processor memory circuitry portion contemporaneous with prefetching and storing layer weights associated with the (n+1)st layer of the multi-layer neural network in a second on-chip processor memory circuitry portion. The storage of layer weights in on-chip processor memory circuitry beneficially decreases the time required to transfer the layer weights upon execution of the (n+1)st layer of the multi-layer neural network by the first on-chip processor memory circuitry portion. In addition, the on-chip processor memory circuitry may include a third on-chip processor memory circuitry portion used to store intermediate and/or final input/output values associated with one or more layers included in the multi-layer neural network.
-
公开(公告)号:US11294985B2
公开(公告)日:2022-04-05
申请号:US16175229
申请日:2018-10-30
Applicant: INTEL CORPORATION
Inventor: Amrita Mathuriya , Sasikanth Manipatruni , Dmitri Nikonov , Ian Young , Ram Krishnamurthy
Abstract: Techniques are provided for efficient matrix multiplication using in-memory analog parallel processing, with applications for neural networks and artificial intelligence processors. A methodology implementing the techniques according to an embodiment includes storing two matrices in-memory. The first matrix is stored in transposed form such that the transposed first matrix has the same number of rows as the second matrix. The method further includes reading columns of the matrices from the memory in parallel, using disclosed bit line functional read operations and cross bit line functional read operations, which are employed to generate analog dot products between the columns. Each of the dot products corresponds to an element of the matrix multiplication product of the two matrices. In some embodiments, one of the matrices may be used to store neural network weighting factors, and the other matrix may be used to store input data to be processed by the neural network.
-
公开(公告)号:US11251186B2
公开(公告)日:2022-02-15
申请号:US16827542
申请日:2020-03-23
Applicant: Intel Corporation
Inventor: Abhishek Sharma , Noriyuki Sato , Sarah Atanasov , Huseyin Ekin Sumbul , Gregory K. Chen , Phil Knag , Ram Krishnamurthy , Hui Jae Yoo , Van H. Le
IPC: G11C11/24 , H01L27/108 , H01L27/12 , G11C11/4096
Abstract: Examples herein relate to a memory device comprising an eDRAM memory cell, the eDRAM memory cell can include a write circuit formed at least partially over a storage cell and a read circuit formed at least partially under the storage cell; a compute near memory device bonded to the memory device; a processor; and an interface from the memory device to the processor. In some examples, circuitry is included to provide an output of the memory device to emulate output read rate of an SRAM memory device comprises one or more of: a controller, a multiplexer, or a register. Bonding of a surface of the memory device can be made to a compute near memory device or other circuitry. In some examples, a layer with read circuitry can be bonded to a layer with storage cells. Any layers can be bonded together using techniques described herein.
-
公开(公告)号:US20210407168A1
公开(公告)日:2021-12-30
申请号:US17070095
申请日:2020-10-14
Applicant: Intel Corporation
Inventor: Vivek De , Ram Krishnamurthy , Amit Agarwal , Steven Hsu , Monodeep Kar
Abstract: A method comprising: dividing a 3D space into a voxel grid comprising a plurality of voxels; associating a plurality of distance values with the plurality of voxels, each distance value based on a distance to a boundary of an object; selecting an approximate interpolation mode for stepping a ray through a first one or more voxels of the 3D space responsive to the first one or more voxels having distance values greater than a threshold; and detecting the ray reaching a second one or more voxels having distance values less than the first threshold; and responsively selecting a precise interpolation mode for stepping the ray through the second one or more voxels.
-
公开(公告)号:US20210407039A1
公开(公告)日:2021-12-30
申请号:US16917791
申请日:2020-06-30
Applicant: Intel Corporation
Inventor: Vivek De , Ram Krishnamurthy , Amit Agarwal , Steven Hsu , Monodeep Kar
Abstract: A method comprising: dividing a 3D space into a voxel grid comprising a plurality of voxels; associating a plurality of distance values with the plurality of voxels, each distance value based on a distance to a boundary of an object; selecting an approximate interpolation mode for stepping a ray through a first one or more voxels of the 3D space responsive to the first one or more voxels having distance values greater than a threshold; and detecting the ray reaching a second one or more voxels having distance values less than the first threshold; and responsively selecting a precise interpolation mode for stepping the ray through the second one or more voxels.
-
58.
公开(公告)号:US20210281250A1
公开(公告)日:2021-09-09
申请号:US16813558
申请日:2020-03-09
Applicant: Intel Corporation
Inventor: Steven Hsu , Amit Agarwal , Simeon Realov , Satish Damaraju , Ram Krishnamurthy
IPC: H03K3/037 , G06F3/06 , G01R31/3177 , G06F1/06
Abstract: A new family of shared clock single-edge triggered flip-flops that reduces a number of internal clock devices from 8 to 6 devices to reduce clock power. The static pass-gate master-slave flip-flop has no performance penalty compared to the flip-flops with 8 clock devices thus enabling significant power reduction. The flip-flop intelligently maintains the same polarity between the master and slave stages which enables the sharing of the master tristate and slave state feedback clock devices without risk of charge sharing across all combinations of clock and data toggling. Because of this, the state of the flip-flop remains undisturbed, and is robust across charge sharing noise. A multi-bit time borrowing internal stitched flip-flop is also described, which enables internal stitching of scan in a high performance time-borrowing flip-flop without incurring increase in layout area.
-
公开(公告)号:US20210194468A1
公开(公告)日:2021-06-24
申请号:US16725689
申请日:2019-12-23
Applicant: Intel Corporation
Inventor: Amit Agarwal , Steven Hsu , Anupama Ambardar Thaploo , Simeon Realov , Ram Krishnamurthy
IPC: H03K3/037 , H03K3/038 , G01R31/3177 , G01R31/317
Abstract: A family of novel, low power, min-drive strength, double-edge triggered (DET) input data multiplexer (Mux-D) scan flip-flop (FF) is provided. The flip-flop takes the advantage of no state node in the slave to remove data inverters in a traditional DET FF to save power, without affecting the flip-flop functionality under coupling/glitch scenarios.
-
公开(公告)号:US11016701B2
公开(公告)日:2021-05-25
申请号:US16146878
申请日:2018-09-28
Applicant: Intel Corporation
Inventor: Ian Young , Ram Krishnamurthy , Sasikanth Manipatruni , Amrita Mathuriya , Abhishek Sharma , Raghavan Kumar , Phil Knag , Huseyin Sumbul , Gregory Chen
Abstract: Techniques and mechanisms for a memory device to perform in-memory computing based on a logic state which is detected with a voltage-controlled oscillator (VCO). In an embodiment, a VCO circuit of the memory device receives from a memory array a first signal indicating a logic state that is based on one or more currently stored data bits. The VCO provides a conversion from the logic state being indicated by a voltage characteristic of the first signal to the logic state being indicated by a corresponding frequency characteristic of a cyclical signal. Based on the frequency characteristic, the logic state is identified and communicated for use in an in-memory computation at the memory device. In another embodiment, a result of the in-memory computation is written back to the memory array.
-
-
-
-
-
-
-
-
-