DUAL-SIX-TRANSISTOR (D6T) IN-MEMORY COMPUTING (IMC) ACCELERATOR SUPPORTING ALWAYS-LINEAR DISCHARGE AND REDUCING DIGITAL STEPS

    公开(公告)号:US20240233815A9

    公开(公告)日:2024-07-11

    申请号:US18377840

    申请日:2023-10-09

    CPC classification number: G11C11/419 G11C8/16 G11C11/54

    Abstract: A dual-six-transistor (D6T) in-memory computing (IMC) accelerator supporting always-linear discharge and reducing digital steps is provided. In the IMC accelerator, three effective techniques are proposed: (1) A D6T bitcell can reliably run at 0.4 V and enter a standby mode at 0.26 V, to support parallel processing of dual decoupled ports. (2) An always-linear discharge and convolution mechanism (ALDCM) not only reduces a voltage of a bit line (BL), but also keeps linear calculation throughout an entire voltage range of the BL. (3) A bypass of a bias voltage time converter (BVTC) reduces digital steps, but still keeps high energy efficiency and computing density at a low voltage. A measurement result of the IMC accelerator shows that the IMC accelerator achieves an average energy efficiency of 8918 TOPS/W (8b×8b), and an average computing density of 38.6 TOPS/mm2 (8b×8b) in a 55 nm CMOS technology.

    DUAL-SIX-TRANSISTOR (D6T) IN-MEMORY COMPUTING (IMC) ACCELERATOR SUPPORTING ALWAYS-LINEAR DISCHARGE AND REDUCING DIGITAL STEPS

    公开(公告)号:US20240135989A1

    公开(公告)日:2024-04-25

    申请号:US18377840

    申请日:2023-10-08

    CPC classification number: G11C11/419 G11C8/16 G11C11/54

    Abstract: A dual-six-transistor (D6T) in-memory computing (IMC) accelerator supporting always-linear discharge and reducing digital steps is provided. In the IMC accelerator, three effective techniques are proposed: (1) A D6T bitcell can reliably run at 0.4 V and enter a standby mode at 0.26 V, to support parallel processing of dual decoupled ports. (2) An always-linear discharge and convolution mechanism (ALDCM) not only reduces a voltage of a bit line (BL), but also keeps linear calculation throughout an entire voltage range of the BL. (3) A bypass of a bias voltage time converter (BVTC) reduces digital steps, but still keeps high energy efficiency and computing density at a low voltage. A measurement result of the IMC accelerator shows that the IMC accelerator achieves an average energy efficiency of 8918 TOPS/W (8b×8b), and an average computing density of 38.6 TOPS/mm2 (8b×8b) in a 55 nm CMOS technology.

    ENERGY-EFFICIENT CRYOGENIC-IN-MEMORY-COMPUTING (CIMC) ACCELERATOR

    公开(公告)号:US20240221811A1

    公开(公告)日:2024-07-04

    申请号:US18229698

    申请日:2023-08-03

    Abstract: An energy-efficient cryogenic-in-memory-computing (CIMC) accelerator includes cryogenic 3T (C3T) macros. Each of the C3T macros comprises a C3T array containing M rows×N columns of bitcells. An input signal is converted into a timing sequence signal of a corresponding pulse width by using a digital timing sequence converter array. A C3T bitcell of a corresponding row in the C3T macro is controlled to perform charging and discharging on a read bit line (RBL) of a corresponding column. A voltage on the RBL of the corresponding column is sampled by a sense amplifier configured in each C3T macro to obtain a final result. With adaptive reference voltage configuration and storage on the chip, this design can achieve fast and low-power boolean/convolutional computing.

    ULTRA-LOW-VOLTAGE STATIC RANDOM ACCESS MEMORY (SRAM) CELL FOR ELIMINATING HALF-SELECT DISTURBANCE UNDER BIT INTERLEAVING STRUCTURE

    公开(公告)号:US20240212748A1

    公开(公告)日:2024-06-27

    申请号:US18233350

    申请日:2023-08-14

    CPC classification number: G11C11/419

    Abstract: An ultra-low-voltage static random access memory (SRAM) cell for eliminating half-select-disturbance under a bit interleaving structure includes a cross-coupled inverter pair, two N-type write transistors NM1 and NM2, two P-type write transistors PM1 and PM2, and two N-type transistors NM3 and NM4, where the two N-type transistors NM3 and NM4 form a readout path. The present disclosure can be applied to applications with a storage requirement at an ultra-low voltage, especially applications with certain requirements for an access speed and reliability of an SRAM at a low voltage. Compared with other different SRAM cells, the ultra-low-voltage SRAM cell can achieve higher read and write working frequencies with similar energy consumptions.

    ENHANCED DYNAMIC RANDOM ACCESS MEMORY (EDRAM)-BASED COMPUTING-IN-MEMORY (CIM) CONVOLUTIONAL NEURAL NETWORK (CNN) ACCELERATOR

    公开(公告)号:US20230196079A1

    公开(公告)日:2023-06-22

    申请号:US18009341

    申请日:2022-08-05

    CPC classification number: G06N3/0464 G06F5/16

    Abstract: An enhanced dynamic random access memory (eDRAM)-based computing-in-memory (CIM) convolutional neural network (CNN) accelerator comprises four P2ARAM blocks, where each of the P2ARAM blocks includes a 5T1C ping-pong eDRAM bit cell array composed of 64×16 5T1C ping-pong eDRAM bit cells. In each of the P2ARAM blocks, 64×2 digital time converters convert a 4-bit activation value into different pulse widths from a row direction and input the pulse widths into the 5T1C ping-pong eDRAM bit cell array for calculation. A total of 16×2 convolution results are output in a column direction of the 5T1C ping-pong eDRAM bit cell array. The CNN accelerator uses the 5T1C ping-pong eDRAM bit cells to perform multi-bit storage and convolution in parallel. An S2M-ADC scheme is proposed to allot an area of an input sampling capacitor of an ABL to sign-numerical SAR ADC units of a C-DAC array without adding area overhead.

    EFFICIENT K-NEAREST NEIGHBOR SEARCH ALGORITHM FOR THREE-DIMENSIONAL (3D) LIDAR POINT CLOUD IN UNMANNED DRIVING

    公开(公告)号:US20220148281A1

    公开(公告)日:2022-05-12

    申请号:US17593852

    申请日:2021-06-09

    Inventor: Hao SUN Yajun HA

    Abstract: An efficient K-nearest neighbor search algorithm for three-dimensional (3D) lidar point cloud in unmanned driving and a use of the foregoing K-nearest neighbor search algorithm in a point cloud map matching process in the unmanned driving are provided. A novel data structure for fast K-nearest neighbor search is used, such that each voxel or sub-voxel includes a proper quantity of points to reduce redundant search. The novel K-nearest neighbor search algorithm is based on a double segmentation voxel structure (DSVS) and a field programmable gate array (FPGA). By means of the novel K-nearest neighbor search algorithm, nearest neighbors are searched for only in a neighboring expected area near a search point, thereby reducing search of redundant points. In addition, an optimized data transmission and access policy is used, which makes the algorithm more fit the characteristic of the FPGA.

    EFFICIENT PARALLEL COMPUTING METHOD FOR BOX FILTER

    公开(公告)号:US20210248764A1

    公开(公告)日:2021-08-12

    申请号:US17054169

    申请日:2020-06-17

    Abstract: An efficient parallel computing method for a box filter, includes: step 1, with respect to a given degree of parallelism N and a radius r of the filter kernel, establishing a first architecture provided without an extra register and a second architecture provided with the extra register; step 2, building a first adder tree for the first architecture and a second adder tree for the second architecture, respectively; step 3, searching the first adder tree and the second adder tree from top to bottom, calculating the pixel average corresponding to each filter kernel by using the first adder tree and the second adder tree, respectively, and counting resources required to be consumed by the first architecture and the second architecture, respectively; and, step 4, selecting one architecture consuming a relatively small resources from the first architecture and the second architecture for computing the box filter.

    STATIC RANDOM-ACCESS MEMORY (SRAM) CELL FOR HIGH-SPEED CONTENT-ADDRESSABLE MEMORY AND IN-MEMORY BOOLEAN LOGIC OPERATION

    公开(公告)号:US20230197154A1

    公开(公告)日:2023-06-22

    申请号:US17802968

    申请日:2021-09-22

    Inventor: Jian CHEN Yajun HA

    CPC classification number: G11C15/04

    Abstract: A static random-access memory (SRAM) cell for high-speed content-addressable memory (CAM) and in-memory Boolean logic operations includes a standard 6T-SRAM and two additional PMOS access transistors, where read word lines of the two positive-channel metal oxide semiconductor (PMOS) access transistors P1 and P2 are RWLR and RWLL respectively, and under the control thereof, a differential read port RBL/RBL is formed. The SRAM cell is suitable for multi-row address selection, and typically applied to in-memory high-speed CAM and in-memory Boolean logic operations. Due to PMOS device characteristics, the structure design of the SRAM cell can avoid read disturbance generated by an in-memory SRAM, and ensure that the SRAM can perform in-memory CAM and in-memory Boolean logic operations stably at a high speed. In addition, this SRAM-based IMC solution supports commercial CMOS technology, and has an opportunity to leverage a large number of existing on-chip SRAM caches.

    PURE INTEGER QUANTIZATION METHOD FOR LIGHTWEIGHT NEURAL NETWORK (LNN)

    公开(公告)号:US20230196095A1

    公开(公告)日:2023-06-22

    申请号:US17799933

    申请日:2021-09-22

    CPC classification number: G06N3/08

    Abstract: A pure integer quantization method for a lightweight neural network (LNN) is provided. The method includes the following steps: acquiring a maximum value of each pixel in each of the channels of the feature map of a current layer; dividing a value of each pixel in each of the channels of the feature map by a t-th power of the maximum value, t∈[0,1]; multiplying a weight in each of the channels by the maximum value of each pixel in each of the channels of the corresponding feature map; and convolving the processed feature map with the processed weight to acquire the feature map of a next layer. The algorithm is verified on SkyNet and MobileNet respectively, and lossless INT8 quantization on SkyNet and maximum quantization accuracy so far on MobileNetv2 are achieved.

Patent Agency Ranking