Ripple push method for graph cut
    11.
    发明授权

    公开(公告)号:US11934459B2

    公开(公告)日:2024-03-19

    申请号:US17799278

    申请日:2021-09-22

    CPC classification number: G06F16/9024 G06T7/13 G06T7/162 G06T2207/20072

    Abstract: A ripple push method for a graph cut includes: obtaining an excess flow ef(v) of a current node v; traversing four edges connecting the current node v in top, bottom, left and right directions, and determining whether each of the four edges is a pushable edge; calculating, according to different weight functions, a maximum push value of each of the four edges by efw=ef(v)*W, where W denotes a weight function; and traversing the four edges, recording a pushable flow of each of the four edges, and pushing out a calculated flow. The ripple push method explores different push weight functions, and significantly improves the actual parallelism of the push-relabel algorithm.

    Method for Disseminating Scaling Information and Application Thereof in VLSI Implementation of Fixed-Point FFT

    公开(公告)号:US20230179315A1

    公开(公告)日:2023-06-08

    申请号:US18049932

    申请日:2022-10-26

    CPC classification number: H04J11/00

    Abstract: Example embodiments relate to methods for disseminating scaling information and applications thereof in very large scale integration (VLSI) implementations of fixed-point fast Fourier transforms (FFTs). One embodiment includes a method for disseminating scaling information in a system. The system includes a linear decomposable transformation process and an inverse process of the linear decomposable transformation process. The inverse process of the linear decomposable transformation process is defined, in time or space, as an inverse linear decomposable transformation process. The linear decomposable transformation process is separated from the inverse linear decomposable transformation process. The linear decomposable transformation process or the inverse linear decomposable transformation process is able to be performed first and is defined as a linear decomposable transformation I. The other remaining process is performed subsequently and is defined as a linear decomposable transformation II. The method for disseminating scaling information is used for a bit width-optimized and energy-saving hardware implementation.

    Fast and energy-efficient K-nearest neighbor (KNN) search accelerator for large-scale point cloud

    公开(公告)号:US12292888B1

    公开(公告)日:2025-05-06

    申请号:US18985065

    申请日:2024-12-18

    Inventor: Yunhao Hu Yajun Ha

    Abstract: A fast and energy-efficient K-nearest neighbors search accelerator for a large-scale point cloud is provided. A nearest sub-voxel-selection (NSVS) framework that performs search based on a double-segmentation-voxel-structure (DSVS) search structure is constructed, and a K-nearest neighbors search algorithm for a large-scale point cloud map is implemented on a field programmable gate array (FPGA). The K-nearest neighbors search accelerator is configured for constructing the DSVS search structure, and searching for K-nearest neighbors based on the DSVS search structure. An experimental result on a KITTI dataset shows that the K-nearest neighbors search accelerator has a search speed 9.1 times faster than a state-of-the-art FPGA implementation. In addition, the K-nearest neighbors search accelerator also achieves an optimal energy efficiency, and the optimal energy efficiency is 11.5 times and 13.5 times higher than state-of-the-art FPGA and GPU implementations respectively.

    Automatic overclocking controller based on circuit delay measurement

    公开(公告)号:US12181911B2

    公开(公告)日:2024-12-31

    申请号:US18224579

    申请日:2023-07-21

    Abstract: An automatic overclocking controller based on circuit delay measurement is provided, including a central processing unit (CPU), a clock generator, and a timing delay monitor (TDM) controller. Compared with the prior art, the present disclosure has following innovative points: A two-dimension-multi-frame fusion (2D-MFF) technology is used to process a sampling result, to eliminate sampling noise, and an automatic overclocking controller running on a heterogeneous field programmable gate array (FPGA) can automatically search for a highest frequency at which an accelerator can operate safely.

    Pure integer quantization method for lightweight neural network (LNN)

    公开(公告)号:US11934954B2

    公开(公告)日:2024-03-19

    申请号:US17799933

    申请日:2021-09-22

    CPC classification number: G06N3/08

    Abstract: A pure integer quantization method for a lightweight neural network (LNN) is provided. The method includes the following steps: acquiring a maximum value of each pixel in each of the channels of the feature map of a current layer; dividing a value of each pixel in each of the channels of the feature map by a t-th power of the maximum value, t∈[0,1]; multiplying a weight in each of the channels by the maximum value of each pixel in each of the channels of the corresponding feature map; and convolving the processed feature map with the processed weight to acquire the feature map of a next layer. The algorithm is verified on SkyNet and MobileNet respectively, and lossless INT8 quantization on SkyNet and maximum quantization accuracy so far on MobileNetv2 are achieved.

    Efficient parallel computing method for box filter

    公开(公告)号:US11094071B1

    公开(公告)日:2021-08-17

    申请号:US17054169

    申请日:2020-06-17

    Abstract: An efficient parallel computing method for a box filter, includes: step 1, with respect to a given degree of parallelism N and a radius r of the filter kernel, establishing a first architecture provided without an extra register and a second architecture provided with the extra register; step 2, building a first adder tree for the first architecture and a second adder tree for the second architecture, respectively; step 3, searching the first adder tree and the second adder tree from top to bottom, calculating the pixel average corresponding to each filter kernel by using the first adder tree and the second adder tree, respectively, and counting resources required to be consumed by the first architecture and the second architecture, respectively; and, step 4, selecting one architecture consuming a relatively small resources from the first architecture and the second architecture for computing the box filter.

Patent Agency Ranking