Compute near memory convolution accelerator

    公开(公告)号:US11726950B2

    公开(公告)日:2023-08-15

    申请号:US16586975

    申请日:2019-09-28

    CPC classification number: G06F15/8046 G06F17/153 G06N3/063

    Abstract: A compute near memory (CNM) convolution accelerator enables a convolutional neural network (CNN) to use dedicated acceleration to achieve efficient in-place convolution operations with less impact on memory and energy consumption. A 2D convolution operation is reformulated as 1D row-wise convolution. The 1D row-wise convolution enables the CNM convolution accelerator to process input activations row-by-row, while using the weights one-by-one. Lightweight access circuits provide the ability to stream both weights and input rows as vectors to MAC units, which in turn enables modules of the CNM convolution accelerator to implement convolution for both [1×1] and chosen [n×n] sized filters.

    Weight prefetch for in-memory neural network execution

    公开(公告)号:US11347994B2

    公开(公告)日:2022-05-31

    申请号:US16160466

    申请日:2018-10-15

    Abstract: The present disclosure is directed to systems and methods of bit-serial, in-memory, execution of at least an nth layer of a multi-layer neural network in a first on-chip processor memory circuitry portion contemporaneous with prefetching and storing layer weights associated with the (n+1)st layer of the multi-layer neural network in a second on-chip processor memory circuitry portion. The storage of layer weights in on-chip processor memory circuitry beneficially decreases the time required to transfer the layer weights upon execution of the (n+1)st layer of the multi-layer neural network by the first on-chip processor memory circuitry portion. In addition, the on-chip processor memory circuitry may include a third on-chip processor memory circuitry portion used to store intermediate and/or final input/output values associated with one or more layers included in the multi-layer neural network.

    Efficient analog in-memory matrix multiplication processor

    公开(公告)号:US11294985B2

    公开(公告)日:2022-04-05

    申请号:US16175229

    申请日:2018-10-30

    Abstract: Techniques are provided for efficient matrix multiplication using in-memory analog parallel processing, with applications for neural networks and artificial intelligence processors. A methodology implementing the techniques according to an embodiment includes storing two matrices in-memory. The first matrix is stored in transposed form such that the transposed first matrix has the same number of rows as the second matrix. The method further includes reading columns of the matrices from the memory in parallel, using disclosed bit line functional read operations and cross bit line functional read operations, which are employed to generate analog dot products between the columns. Each of the dot products corresponds to an element of the matrix multiplication product of the two matrices. In some embodiments, one of the matrices may be used to store neural network weighting factors, and the other matrix may be used to store input data to be processed by the neural network.

    Compute near memory with backend memory

    公开(公告)号:US11251186B2

    公开(公告)日:2022-02-15

    申请号:US16827542

    申请日:2020-03-23

    Abstract: Examples herein relate to a memory device comprising an eDRAM memory cell, the eDRAM memory cell can include a write circuit formed at least partially over a storage cell and a read circuit formed at least partially under the storage cell; a compute near memory device bonded to the memory device; a processor; and an interface from the memory device to the processor. In some examples, circuitry is included to provide an output of the memory device to emulate output read rate of an SRAM memory device comprises one or more of: a controller, a multiplexer, or a register. Bonding of a surface of the memory device can be made to a compute near memory device or other circuitry. In some examples, a layer with read circuitry can be bonded to a layer with storage cells. Any layers can be bonded together using techniques described herein.

    APPARATUS AND METHOD FOR APPROXIMATE TRILINEAR INTERPOLATION FOR SCENE RECONSTRUCTION

    公开(公告)号:US20210407168A1

    公开(公告)日:2021-12-30

    申请号:US17070095

    申请日:2020-10-14

    Abstract: A method comprising: dividing a 3D space into a voxel grid comprising a plurality of voxels; associating a plurality of distance values with the plurality of voxels, each distance value based on a distance to a boundary of an object; selecting an approximate interpolation mode for stepping a ray through a first one or more voxels of the 3D space responsive to the first one or more voxels having distance values greater than a threshold; and detecting the ray reaching a second one or more voxels having distance values less than the first threshold; and responsively selecting a precise interpolation mode for stepping the ray through the second one or more voxels.

    LOW-POWER SINGLE-EDGE TRIGGERED FLIP-FLOP, AND TIME BORROWING INTERNALLY STITCHED FLIP-FLOP

    公开(公告)号:US20210281250A1

    公开(公告)日:2021-09-09

    申请号:US16813558

    申请日:2020-03-09

    Abstract: A new family of shared clock single-edge triggered flip-flops that reduces a number of internal clock devices from 8 to 6 devices to reduce clock power. The static pass-gate master-slave flip-flop has no performance penalty compared to the flip-flops with 8 clock devices thus enabling significant power reduction. The flip-flop intelligently maintains the same polarity between the master and slave stages which enables the sharing of the master tristate and slave state feedback clock devices without risk of charge sharing across all combinations of clock and data toggling. Because of this, the state of the flip-flop remains undisturbed, and is robust across charge sharing noise. A multi-bit time borrowing internal stitched flip-flop is also described, which enables internal stitching of scan in a high performance time-borrowing flip-flop without incurring increase in layout area.

Patent Agency Ranking