-
1.
公开(公告)号:US20240233815A9
公开(公告)日:2024-07-11
申请号:US18377840
申请日:2023-10-09
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Hongtu ZHANG , Yuhao SHU , Yajun HA
IPC: G11C11/419 , G11C8/16 , G11C11/54
CPC classification number: G11C11/419 , G11C8/16 , G11C11/54
Abstract: A dual-six-transistor (D6T) in-memory computing (IMC) accelerator supporting always-linear discharge and reducing digital steps is provided. In the IMC accelerator, three effective techniques are proposed: (1) A D6T bitcell can reliably run at 0.4 V and enter a standby mode at 0.26 V, to support parallel processing of dual decoupled ports. (2) An always-linear discharge and convolution mechanism (ALDCM) not only reduces a voltage of a bit line (BL), but also keeps linear calculation throughout an entire voltage range of the BL. (3) A bypass of a bias voltage time converter (BVTC) reduces digital steps, but still keeps high energy efficiency and computing density at a low voltage. A measurement result of the IMC accelerator shows that the IMC accelerator achieves an average energy efficiency of 8918 TOPS/W (8b×8b), and an average computing density of 38.6 TOPS/mm2 (8b×8b) in a 55 nm CMOS technology.
-
2.
公开(公告)号:US20240135989A1
公开(公告)日:2024-04-25
申请号:US18377840
申请日:2023-10-08
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Hongtu ZHANG , Yuhao SHU , Yajun HA
IPC: G11C11/419 , G11C8/16 , G11C11/54
CPC classification number: G11C11/419 , G11C8/16 , G11C11/54
Abstract: A dual-six-transistor (D6T) in-memory computing (IMC) accelerator supporting always-linear discharge and reducing digital steps is provided. In the IMC accelerator, three effective techniques are proposed: (1) A D6T bitcell can reliably run at 0.4 V and enter a standby mode at 0.26 V, to support parallel processing of dual decoupled ports. (2) An always-linear discharge and convolution mechanism (ALDCM) not only reduces a voltage of a bit line (BL), but also keeps linear calculation throughout an entire voltage range of the BL. (3) A bypass of a bias voltage time converter (BVTC) reduces digital steps, but still keeps high energy efficiency and computing density at a low voltage. A measurement result of the IMC accelerator shows that the IMC accelerator achieves an average energy efficiency of 8918 TOPS/W (8b×8b), and an average computing density of 38.6 TOPS/mm2 (8b×8b) in a 55 nm CMOS technology.
-
3.
公开(公告)号:US20240230907A1
公开(公告)日:2024-07-11
申请号:US18387859
申请日:2023-11-08
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Jianzhong XIAO , Hao SUN , Qi DENG , Yajun HA
CPC classification number: G01S17/89 , G06T7/70 , G06T2207/10028 , G06T2207/20021
Abstract: An efficient K-nearest neighbor (KNN) method for a single-frame point cloud of a LiDAR and an application of the efficient KNN method for the single-frame point cloud of the LiDAR are provided, where the efficient KNN method for the single-frame point cloud of the LiDAR is accelerated by a field-programmable gate array (FPGA). In the efficient KNN method for the single-frame point cloud of the LiDAR, a data structure is established based on point cloud projection and a distance scale. The data structure ensures that adjacent points in space are organized in adjacent memories. A new data structure is efficiently constructed. An efficient nearest point search mode is provided.
-
公开(公告)号:US20240221811A1
公开(公告)日:2024-07-04
申请号:US18229698
申请日:2023-08-03
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Yuhao SHU , Hongtu ZHANG , Yajun HA
IPC: G11C11/405 , G06F17/15 , G11C11/4091 , G11C11/4096 , H03K19/20
CPC classification number: G11C11/405 , G06F17/153 , G11C11/4091 , G11C11/4096 , H03K19/20
Abstract: An energy-efficient cryogenic-in-memory-computing (CIMC) accelerator includes cryogenic 3T (C3T) macros. Each of the C3T macros comprises a C3T array containing M rows×N columns of bitcells. An input signal is converted into a timing sequence signal of a corresponding pulse width by using a digital timing sequence converter array. A C3T bitcell of a corresponding row in the C3T macro is controlled to perform charging and discharging on a read bit line (RBL) of a corresponding column. A voltage on the RBL of the corresponding column is sampled by a sense amplifier configured in each C3T macro to obtain a final result. With adaptive reference voltage configuration and storage on the chip, this design can achieve fast and low-power boolean/convolutional computing.
-
公开(公告)号:US20240212748A1
公开(公告)日:2024-06-27
申请号:US18233350
申请日:2023-08-14
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Yifei LI , Jian CHEN , Yajun HA , Hongyu CHEN
IPC: G11C11/419
CPC classification number: G11C11/419
Abstract: An ultra-low-voltage static random access memory (SRAM) cell for eliminating half-select-disturbance under a bit interleaving structure includes a cross-coupled inverter pair, two N-type write transistors NM1 and NM2, two P-type write transistors PM1 and PM2, and two N-type transistors NM3 and NM4, where the two N-type transistors NM3 and NM4 form a readout path. The present disclosure can be applied to applications with a storage requirement at an ultra-low voltage, especially applications with certain requirements for an access speed and reliability of an SRAM at a low voltage. Compared with other different SRAM cells, the ultra-low-voltage SRAM cell can achieve higher read and write working frequencies with similar energy consumptions.
-
6.
公开(公告)号:US20230196079A1
公开(公告)日:2023-06-22
申请号:US18009341
申请日:2022-08-05
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Hongtu ZHANG , Yuhao SHU , Yajun HA
IPC: G06N3/0464 , G06F5/16
CPC classification number: G06N3/0464 , G06F5/16
Abstract: An enhanced dynamic random access memory (eDRAM)-based computing-in-memory (CIM) convolutional neural network (CNN) accelerator comprises four P2ARAM blocks, where each of the P2ARAM blocks includes a 5T1C ping-pong eDRAM bit cell array composed of 64×16 5T1C ping-pong eDRAM bit cells. In each of the P2ARAM blocks, 64×2 digital time converters convert a 4-bit activation value into different pulse widths from a row direction and input the pulse widths into the 5T1C ping-pong eDRAM bit cell array for calculation. A total of 16×2 convolution results are output in a column direction of the 5T1C ping-pong eDRAM bit cell array. The CNN accelerator uses the 5T1C ping-pong eDRAM bit cells to perform multi-bit storage and convolution in parallel. An S2M-ADC scheme is proposed to allot an area of an input sampling capacitor of an ABL to sign-numerical SAR ADC units of a C-DAC array without adding area overhead.
-
公开(公告)号:US20220148281A1
公开(公告)日:2022-05-12
申请号:US17593852
申请日:2021-06-09
Applicant: SHANGHAITECH UNIVERSITY
Abstract: An efficient K-nearest neighbor search algorithm for three-dimensional (3D) lidar point cloud in unmanned driving and a use of the foregoing K-nearest neighbor search algorithm in a point cloud map matching process in the unmanned driving are provided. A novel data structure for fast K-nearest neighbor search is used, such that each voxel or sub-voxel includes a proper quantity of points to reduce redundant search. The novel K-nearest neighbor search algorithm is based on a double segmentation voxel structure (DSVS) and a field programmable gate array (FPGA). By means of the novel K-nearest neighbor search algorithm, nearest neighbors are searched for only in a neighboring expected area near a search point, thereby reducing search of redundant points. In addition, an optimized data transmission and access policy is used, which makes the algorithm more fit the characteristic of the FPGA.
-
公开(公告)号:US20210248764A1
公开(公告)日:2021-08-12
申请号:US17054169
申请日:2020-06-17
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Xinzhe LIU , Fupeng CHEN , Yajun HA
Abstract: An efficient parallel computing method for a box filter, includes: step 1, with respect to a given degree of parallelism N and a radius r of the filter kernel, establishing a first architecture provided without an extra register and a second architecture provided with the extra register; step 2, building a first adder tree for the first architecture and a second adder tree for the second architecture, respectively; step 3, searching the first adder tree and the second adder tree from top to bottom, calculating the pixel average corresponding to each filter kernel by using the first adder tree and the second adder tree, respectively, and counting resources required to be consumed by the first architecture and the second architecture, respectively; and, step 4, selecting one architecture consuming a relatively small resources from the first architecture and the second architecture for computing the box filter.
-
9.
公开(公告)号:US20230197154A1
公开(公告)日:2023-06-22
申请号:US17802968
申请日:2021-09-22
Applicant: SHANGHAITECH UNIVERSITY
IPC: G11C15/04
CPC classification number: G11C15/04
Abstract: A static random-access memory (SRAM) cell for high-speed content-addressable memory (CAM) and in-memory Boolean logic operations includes a standard 6T-SRAM and two additional PMOS access transistors, where read word lines of the two positive-channel metal oxide semiconductor (PMOS) access transistors P1 and P2 are RWLR and RWLL respectively, and under the control thereof, a differential read port RBL/RBL is formed. The SRAM cell is suitable for multi-row address selection, and typically applied to in-memory high-speed CAM and in-memory Boolean logic operations. Due to PMOS device characteristics, the structure design of the SRAM cell can avoid read disturbance generated by an in-memory SRAM, and ensure that the SRAM can perform in-memory CAM and in-memory Boolean logic operations stably at a high speed. In addition, this SRAM-based IMC solution supports commercial CMOS technology, and has an opportunity to leverage a large number of existing on-chip SRAM caches.
-
公开(公告)号:US20230196095A1
公开(公告)日:2023-06-22
申请号:US17799933
申请日:2021-09-22
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Weixiong JIANG , Yajun HA
IPC: G06N3/08
CPC classification number: G06N3/08
Abstract: A pure integer quantization method for a lightweight neural network (LNN) is provided. The method includes the following steps: acquiring a maximum value of each pixel in each of the channels of the feature map of a current layer; dividing a value of each pixel in each of the channels of the feature map by a t-th power of the maximum value, t∈[0,1]; multiplying a weight in each of the channels by the maximum value of each pixel in each of the channels of the corresponding feature map; and convolving the processed feature map with the processed weight to acquire the feature map of a next layer. The algorithm is verified on SkyNet and MobileNet respectively, and lossless INT8 quantization on SkyNet and maximum quantization accuracy so far on MobileNetv2 are achieved.
-
-
-
-
-
-
-
-
-