-
公开(公告)号:US11934459B2
公开(公告)日:2024-03-19
申请号:US17799278
申请日:2021-09-22
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Guangyao Yan , Xinzhe Liu , Yajun Ha , Hui Wang
IPC: G06T7/162 , G06F16/901 , G06T7/13
CPC classification number: G06F16/9024 , G06T7/13 , G06T7/162 , G06T2207/20072
Abstract: A ripple push method for a graph cut includes: obtaining an excess flow ef(v) of a current node v; traversing four edges connecting the current node v in top, bottom, left and right directions, and determining whether each of the four edges is a pushable edge; calculating, according to different weight functions, a maximum push value of each of the four edges by efw=ef(v)*W, where W denotes a weight function; and traversing the four edges, recording a pushable flow of each of the four edges, and pushing out a calculated flow. The ripple push method explores different push weight functions, and significantly improves the actual parallelism of the push-relabel algorithm.
-
12.
公开(公告)号:US20230179315A1
公开(公告)日:2023-06-08
申请号:US18049932
申请日:2022-10-26
Applicant: IMEC VZW , ShanghaiTech University
Inventor: Xinzhe Liu , Raees Kizhakkumkara Muhamad , Dessislava Nikolova , Yajun Ha , Francky Catthoor , Fupeng Chen , Peter Schelkens , David Blinder
IPC: H04J11/00
CPC classification number: H04J11/00
Abstract: Example embodiments relate to methods for disseminating scaling information and applications thereof in very large scale integration (VLSI) implementations of fixed-point fast Fourier transforms (FFTs). One embodiment includes a method for disseminating scaling information in a system. The system includes a linear decomposable transformation process and an inverse process of the linear decomposable transformation process. The inverse process of the linear decomposable transformation process is defined, in time or space, as an inverse linear decomposable transformation process. The linear decomposable transformation process is separated from the inverse linear decomposable transformation process. The linear decomposable transformation process or the inverse linear decomposable transformation process is able to be performed first and is defined as a linear decomposable transformation I. The other remaining process is performed subsequently and is defined as a linear decomposable transformation II. The method for disseminating scaling information is used for a bit width-optimized and energy-saving hardware implementation.
-
13.
公开(公告)号:US12292888B1
公开(公告)日:2025-05-06
申请号:US18985065
申请日:2024-12-18
Applicant: SHANGHAITECH UNIVERSITY
IPC: G06F16/2453
Abstract: A fast and energy-efficient K-nearest neighbors search accelerator for a large-scale point cloud is provided. A nearest sub-voxel-selection (NSVS) framework that performs search based on a double-segmentation-voxel-structure (DSVS) search structure is constructed, and a K-nearest neighbors search algorithm for a large-scale point cloud map is implemented on a field programmable gate array (FPGA). The K-nearest neighbors search accelerator is configured for constructing the DSVS search structure, and searching for K-nearest neighbors based on the DSVS search structure. An experimental result on a KITTI dataset shows that the K-nearest neighbors search accelerator has a search speed 9.1 times faster than a state-of-the-art FPGA implementation. In addition, the K-nearest neighbors search accelerator also achieves an optimal energy efficiency, and the optimal energy efficiency is 11.5 times and 13.5 times higher than state-of-the-art FPGA and GPU implementations respectively.
-
公开(公告)号:US12181911B2
公开(公告)日:2024-12-31
申请号:US18224579
申请日:2023-07-21
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Weixiong Jiang , Yajun Ha
IPC: G06F1/26 , G06F1/08 , H04B17/364
Abstract: An automatic overclocking controller based on circuit delay measurement is provided, including a central processing unit (CPU), a clock generator, and a timing delay monitor (TDM) controller. Compared with the prior art, the present disclosure has following innovative points: A two-dimension-multi-frame fusion (2D-MFF) technology is used to process a sampling result, to eliminate sampling noise, and an automatic overclocking controller running on a heterogeneous field programmable gate array (FPGA) can automatically search for a highest frequency at which an accelerator can operate safely.
-
公开(公告)号:US11934954B2
公开(公告)日:2024-03-19
申请号:US17799933
申请日:2021-09-22
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Weixiong Jiang , Yajun Ha
IPC: G06N3/08
CPC classification number: G06N3/08
Abstract: A pure integer quantization method for a lightweight neural network (LNN) is provided. The method includes the following steps: acquiring a maximum value of each pixel in each of the channels of the feature map of a current layer; dividing a value of each pixel in each of the channels of the feature map by a t-th power of the maximum value, t∈[0,1]; multiplying a weight in each of the channels by the maximum value of each pixel in each of the channels of the corresponding feature map; and convolving the processed feature map with the processed weight to acquire the feature map of a next layer. The algorithm is verified on SkyNet and MobileNet respectively, and lossless INT8 quantization on SkyNet and maximum quantization accuracy so far on MobileNetv2 are achieved.
-
公开(公告)号:US11762700B2
公开(公告)日:2023-09-19
申请号:US18098746
申请日:2023-01-19
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Hongtu Zhang , Yuhao Shu , Yajun Ha
CPC classification number: G06F9/5027 , G06F7/50 , G06F7/523 , H03K19/21
Abstract: A high-energy-efficiency binary neural network accelerator applicable to artificial intelligence Internet of Things is provided. 0.3-0.6V sub/near threshold 10T1C multiplication bit units with series capacitors are configured for charge domain binary convolution. An anti-process deviation differential voltage amplification array between bit lines and DACs is configured for robust pre-amplification in 0.3V batch standardized operations. A lazy bit line reset scheme further reduces energy, and inference accuracy losses can be ignored. Therefore, a binary neural network accelerator chip based on in-memory computation achieves peak energy efficiency of 18.5 POPS/W and 6.06 POPS/W, which are respectively improved by 21× and 135× compared with previous macro and system work [9, 11].
-
公开(公告)号:US11094071B1
公开(公告)日:2021-08-17
申请号:US17054169
申请日:2020-06-17
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Xinzhe Liu , Fupeng Chen , Yajun Ha
Abstract: An efficient parallel computing method for a box filter, includes: step 1, with respect to a given degree of parallelism N and a radius r of the filter kernel, establishing a first architecture provided without an extra register and a second architecture provided with the extra register; step 2, building a first adder tree for the first architecture and a second adder tree for the second architecture, respectively; step 3, searching the first adder tree and the second adder tree from top to bottom, calculating the pixel average corresponding to each filter kernel by using the first adder tree and the second adder tree, respectively, and counting resources required to be consumed by the first architecture and the second architecture, respectively; and, step 4, selecting one architecture consuming a relatively small resources from the first architecture and the second architecture for computing the box filter.
-
-
-
-
-
-