Patent search ap:("ShanghaiTech University") AND inv:"Yajun Ha" Page 2

11.

发明授权
Ripple push method for graph cut 有权

公开(公告)号：US11934459B2

公开(公告)日：2024-03-19

申请号：US17799278

申请日：2021-09-22

Applicant: SHANGHAITECH UNIVERSITY

Inventor： Guangyao Yan , Xinzhe Liu , Yajun Ha , Hui Wang

IPC: G06T7/162 , G06F16/901 , G06T7/13

CPC classification number: G06F16/9024 , G06T7/13 , G06T7/162 , G06T2207/20072

Abstract: A ripple push method for a graph cut includes: obtaining an excess flow ef(v) of a current node v; traversing four edges connecting the current node v in top, bottom, left and right directions, and determining whether each of the four edges is a pushable edge; calculating, according to different weight functions, a maximum push value of each of the four edges by efw=ef(v)*W, where W denotes a weight function; and traversing the four edges, recording a pushable flow of each of the four edges, and pushing out a calculated flow. The ripple push method explores different push weight functions, and significantly improves the actual parallelism of the push-relabel algorithm.

12.

发明公开
Method for Disseminating Scaling Information and Application Thereof in VLSI Implementation of Fixed-Point FFT 审中-公开

公开(公告)号：US20230179315A1

公开(公告)日：2023-06-08

申请号：US18049932

申请日：2022-10-26

Applicant: IMEC VZW , ShanghaiTech University

Inventor： Xinzhe Liu , Raees Kizhakkumkara Muhamad , Dessislava Nikolova , Yajun Ha , Francky Catthoor , Fupeng Chen , Peter Schelkens , David Blinder

IPC: H04J11/00

CPC classification number: H04J11/00

Abstract: Example embodiments relate to methods for disseminating scaling information and applications thereof in very large scale integration (VLSI) implementations of fixed-point fast Fourier transforms (FFTs). One embodiment includes a method for disseminating scaling information in a system. The system includes a linear decomposable transformation process and an inverse process of the linear decomposable transformation process. The inverse process of the linear decomposable transformation process is defined, in time or space, as an inverse linear decomposable transformation process. The linear decomposable transformation process is separated from the inverse linear decomposable transformation process. The linear decomposable transformation process or the inverse linear decomposable transformation process is able to be performed first and is defined as a linear decomposable transformation I. The other remaining process is performed subsequently and is defined as a linear decomposable transformation II. The method for disseminating scaling information is used for a bit width-optimized and energy-saving hardware implementation.

13.

发明授权
Fast and energy-efficient K-nearest neighbor (KNN) search accelerator for large-scale point cloud 有权

公开(公告)号：US12292888B1

公开(公告)日：2025-05-06

申请号：US18985065

申请日：2024-12-18

Applicant: SHANGHAITECH UNIVERSITY

Inventor： Yunhao Hu , Yajun Ha

IPC: G06F16/2453

Abstract: A fast and energy-efficient K-nearest neighbors search accelerator for a large-scale point cloud is provided. A nearest sub-voxel-selection (NSVS) framework that performs search based on a double-segmentation-voxel-structure (DSVS) search structure is constructed, and a K-nearest neighbors search algorithm for a large-scale point cloud map is implemented on a field programmable gate array (FPGA). The K-nearest neighbors search accelerator is configured for constructing the DSVS search structure, and searching for K-nearest neighbors based on the DSVS search structure. An experimental result on a KITTI dataset shows that the K-nearest neighbors search accelerator has a search speed 9.1 times faster than a state-of-the-art FPGA implementation. In addition, the K-nearest neighbors search accelerator also achieves an optimal energy efficiency, and the optimal energy efficiency is 11.5 times and 13.5 times higher than state-of-the-art FPGA and GPU implementations respectively.

14.

发明授权
Automatic overclocking controller based on circuit delay measurement 有权

公开(公告)号：US12181911B2

公开(公告)日：2024-12-31

申请号：US18224579

申请日：2023-07-21

Applicant: SHANGHAITECH UNIVERSITY

Inventor： Weixiong Jiang , Yajun Ha

IPC: G06F1/26 , G06F1/08 , H04B17/364

Abstract: An automatic overclocking controller based on circuit delay measurement is provided, including a central processing unit (CPU), a clock generator, and a timing delay monitor (TDM) controller. Compared with the prior art, the present disclosure has following innovative points: A two-dimension-multi-frame fusion (2D-MFF) technology is used to process a sampling result, to eliminate sampling noise, and an automatic overclocking controller running on a heterogeneous field programmable gate array (FPGA) can automatically search for a highest frequency at which an accelerator can operate safely.

15.

发明授权
Pure integer quantization method for lightweight neural network (LNN) 有权

公开(公告)号：US11934954B2

公开(公告)日：2024-03-19

申请号：US17799933

申请日：2021-09-22

Applicant: SHANGHAITECH UNIVERSITY

Inventor： Weixiong Jiang , Yajun Ha

IPC: G06N3/08

CPC classification number: G06N3/08

Abstract: A pure integer quantization method for a lightweight neural network (LNN) is provided. The method includes the following steps: acquiring a maximum value of each pixel in each of the channels of the feature map of a current layer; dividing a value of each pixel in each of the channels of the feature map by a t-th power of the maximum value, t∈[0,1]; multiplying a weight in each of the channels by the maximum value of each pixel in each of the channels of the corresponding feature map; and convolving the processed feature map with the processed weight to acquire the feature map of a next layer. The algorithm is verified on SkyNet and MobileNet respectively, and lossless INT8 quantization on SkyNet and maximum quantization accuracy so far on MobileNetv2 are achieved.

16.

发明授权
High-energy-efficiency binary neural network accelerator applicable to artificial intelligence internet of things 有权

公开(公告)号：US11762700B2

公开(公告)日：2023-09-19

申请号：US18098746

申请日：2023-01-19

Applicant: SHANGHAITECH UNIVERSITY

Inventor： Hongtu Zhang , Yuhao Shu , Yajun Ha

IPC: G06F9/50 , G06F7/50 , G06F7/523 , H03K19/21

CPC classification number: G06F9/5027 , G06F7/50 , G06F7/523 , H03K19/21

Abstract: A high-energy-efficiency binary neural network accelerator applicable to artificial intelligence Internet of Things is provided. 0.3-0.6V sub/near threshold 10T1C multiplication bit units with series capacitors are configured for charge domain binary convolution. An anti-process deviation differential voltage amplification array between bit lines and DACs is configured for robust pre-amplification in 0.3V batch standardized operations. A lazy bit line reset scheme further reduces energy, and inference accuracy losses can be ignored. Therefore, a binary neural network accelerator chip based on in-memory computation achieves peak energy efficiency of 18.5 POPS/W and 6.06 POPS/W, which are respectively improved by 21× and 135× compared with previous macro and system work [9, 11].

17.

发明授权
Efficient parallel computing method for box filter 有权

公开(公告)号：US11094071B1

公开(公告)日：2021-08-17

申请号：US17054169

申请日：2020-06-17

Applicant: SHANGHAITECH UNIVERSITY

Inventor： Xinzhe Liu , Fupeng Chen , Yajun Ha

IPC: G06T7/44 , G06F9/38 , G06F1/10 , G06F9/30 , G06F9/54

Abstract: An efficient parallel computing method for a box filter, includes: step 1, with respect to a given degree of parallelism N and a radius r of the filter kernel, establishing a first architecture provided without an extra register and a second architecture provided with the extra register; step 2, building a first adder tree for the first architecture and a second adder tree for the second architecture, respectively; step 3, searching the first adder tree and the second adder tree from top to bottom, calculating the pixel average corresponding to each filter kernel by using the first adder tree and the second adder tree, respectively, and counting resources required to be consumed by the first architecture and the second architecture, respectively; and, step 4, selecting one architecture consuming a relatively small resources from the first architecture and the second architecture for computing the box filter.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification