-
公开(公告)号:US11875244B2
公开(公告)日:2024-01-16
申请号:US18009341
申请日:2022-08-05
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Hongtu Zhang , Yuhao Shu , Yajun Ha
IPC: G06N3/0464 , G06F5/16
CPC classification number: G06N3/0464 , G06F5/16
Abstract: An enhanced dynamic random access memory (eDRAM)-based computing-in-memory (CIM) convolutional neural network (CNN) accelerator comprises four P2ARAM blocks, where each of the P2ARAM blocks includes a 5T1C ping-pong eDRAM bit cell array composed of 64×16 5T1C ping-pong eDRAM bit cells. In each of the P2ARAM blocks, 64×2 digital time converters convert a 4-bit activation value into different pulse widths from a row direction and input the pulse widths into the 5T1C ping-pong eDRAM bit cell array for calculation. A total of 16×2 convolution results are output in a column direction of the 5T1C ping-pong eDRAM bit cell array. The CNN accelerator uses the 5T1C ping-pong eDRAM bit cells to perform multi-bit storage and convolution in parallel. An S2M-ADC scheme is proposed to allot an area of an input sampling capacitor of an ABL to sign-numerical SAR ADC units of a C-DAC array without adding area overhead.
-
公开(公告)号:US11762700B2
公开(公告)日:2023-09-19
申请号:US18098746
申请日:2023-01-19
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Hongtu Zhang , Yuhao Shu , Yajun Ha
CPC classification number: G06F9/5027 , G06F7/50 , G06F7/523 , H03K19/21
Abstract: A high-energy-efficiency binary neural network accelerator applicable to artificial intelligence Internet of Things is provided. 0.3-0.6V sub/near threshold 10T1C multiplication bit units with series capacitors are configured for charge domain binary convolution. An anti-process deviation differential voltage amplification array between bit lines and DACs is configured for robust pre-amplification in 0.3V batch standardized operations. A lazy bit line reset scheme further reduces energy, and inference accuracy losses can be ignored. Therefore, a binary neural network accelerator chip based on in-memory computation achieves peak energy efficiency of 18.5 POPS/W and 6.06 POPS/W, which are respectively improved by 21× and 135× compared with previous macro and system work [9, 11].
-