-
11.
公开(公告)号:US20230192123A1
公开(公告)日:2023-06-22
申请号:US17802148
申请日:2021-09-22
Applicant: SHANGHAITECH UNIVERSITY
IPC: B60W60/00
CPC classification number: B60W60/001 , B60W2420/52 , B60W2554/4049
Abstract: A normal distributions transform (NDT) method for LiDAR point cloud localization in unmanned driving is provided. The method proposes a non-recursive, memory-efficient data structure occupation-aware-voxel-structure (OAVS), which speeds up each search operation. Compared with a tree-based structure, the proposed data structure OAVS is easy to parallelize and consumes only about 1/10 of memory. Based on the data structure OAVS, the method proposes a semantic-assisted OAVS-based (SEO)-NDT algorithm, which significantly reduces the number of search operations, redefines a parameter affecting the number of search operations, and removes a redundant search operation. In addition, the method proposes a streaming field-programmable gate array (FPGA) accelerator architecture, which further improves the real-time and energy-saving performance of the SEO-NDT algorithm. The method meets the real-time and high-precision requirements of smart vehicles for three-dimensional (3D) lidar localization.
-
公开(公告)号:US20220309217A1
公开(公告)日:2022-09-29
申请号:US17595194
申请日:2021-06-09
Applicant: SHANGHAITECH UNIVERSITY
IPC: G06F30/34 , G06F30/3323 , G06F30/337
Abstract: An optimized reconfiguration algorithm based on dynamic voltage and frequency scaling (DVFS) is provided, which mainly has the following contributions. The optimized reconfiguration algorithm based on DVFS proposes a DVFS-based reconfiguration method, which schedules user tasks according to a degree of parallelism (DOP) of the user tasks so as to reconfigure more parallel user tasks, thereby achieving higher reliability. The optimized reconfiguration algorithm based on DVFS proposes a K-means-based heuristic approximation algorithm, which minimizes the delay of the DVFS-based reconfiguration scheduling algorithm. The optimized reconfiguration algorithm based on DVFS proposes a K-means-based method, which reduces memory overhead caused by DVFS-based reconfiguration scheduling. The optimized reconfiguration algorithm based on DVFS improves the reliability of a field programmable gate array (FPGA) system and minimizes the area overhead of a hardware circuit.
-
公开(公告)号:US20210390725A1
公开(公告)日:2021-12-16
申请号:US17286488
申请日:2019-09-20
Applicant: ShanghaiTech University
Inventor: Fupeng CHEN , Heng YU , Yajun HA
Abstract: The present disclosure provides an adaptive stereo matching optimization method, apparatus, and device, and a storage medium. The method includes: acquiring images of at least two perspectives of the same target scene, accordingly obtaining, through calculation, disparity value ranges corresponding to pixels in the target scene; and obtaining optimized depth value ranges by adjusting the disparity value ranges of the pixels in the target scene in real time through an adaptive stereo matching model; adjusting an execution cycle in the adaptive stereo matching model in real time through a DVFS algorithm according to a resource constraint condition of the processing system; and/or training on a plurality of scene image data sets through a convolutional neural network, so that the specific function parameters in the adaptive stereo matching model are correspondingly adjusted in real time according to the acquired different scene images.
-
公开(公告)号:US20210249069A1
公开(公告)日:2021-08-12
申请号:US17051783
申请日:2020-06-17
Applicant: SHANGHAITECH UNIVERSITY
IPC: G11C11/412 , G11C11/419 , H01L27/11
Abstract: A low-power SRAM memory cell includes five word lines and four bit lines. The five word lines are a first word line, a second word line, a third word line, a fourth word line and a fifth word line. The four bit lines are a first bit line, a second bit line, a third bit line, and a fourth bit line. During the operation process of calculating a binary 10×11, the first word line is 1, the second word line is 0, the third word line is 0, the fourth word line is 1, the high bit stored in the bit cell is 1, and the low bit is 1. The voltage value of the fifth word line is 0.73 volt. At this time, the first bit line, the second bit line, and the third bit line do not discharge, while the fourth bit line discharges.
-
公开(公告)号:US20240289914A1
公开(公告)日:2024-08-29
申请号:US18537836
申请日:2023-12-13
Applicant: SHANGHAITECH UNIVERSITY
CPC classification number: G06T1/20 , G06F9/3851
Abstract: A graphics processing unit (GPU)-based logic rewriting acceleration method comprising parallelizing sub-procedures of And-Inverter Graph (AIG)-based logic rewriting. A recursive sub-procedure of the AIG-based logic rewriting is redesigned to be non-recursive, to provide sufficient parallelism for a GPU. In order to parallelize a replacement step on the GPU, the present disclosure uses a lock to ensure mutually exclusive access, which inevitably damages scalability of inter-node parallelism. In order to fully utilize the inter-node parallelism on a large scale, the present disclosure proposes a work scheduler that adds nodes with non-overlapping maximum fan-out-free cones (MFFCs) to a group, such that nodes in an MFFC can be deleted simultaneously without a conflict. In order to simultaneously create and delete a same node, the present disclosure also proposes a GPU-friendly graphical data structure to support these concurrent operations.
-
公开(公告)号:US20240233796A1
公开(公告)日:2024-07-11
申请号:US18505128
申请日:2023-11-09
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Yuhao SHU , Hongtu ZHANG , Yajun HA
IPC: G11C11/402 , G11C11/408 , G11C11/4091 , G11C11/4096
CPC classification number: G11C11/4023 , G11C11/4087 , G11C11/4091 , G11C11/4096
Abstract: An energy-efficient memory for cryogenic computing is provided. The energy-efficient memory includes a plurality of memory banks, where each of the memory banks includes a cryogenic semi-static, dual-port, boost-free gain cell (CSDB-GC) macro module, a universal address decoder, and a different address decoder. The CSDB-GC macro module includes a plurality of columns of local blocks, and each of the local blocks includes a plurality of CSDB-GC memory cells. A final measurement result of a 16 Kb CSDB-eDRAM shows that the 16 Kb CSDB-eDRAM achieves data retention time (DRT) of 16.67 seconds, which is 2.6 times longer than DRT of a state-of-the-art cryogenic eDRAM at a temperature of 4.2 K, and achieves lower refresh power (0.11 pW/Kb). In addition, the 16 Kb CSDB-eDRAM also achieves shorter access time, namely, 710 ps (1.41 GHz). Compared with the state-of-the-art work, the 16 Kb CSDB-eDRAM has a lowest dynamic power consumption overhead, namely, 49.23 uW/Kb.
-
17.
公开(公告)号:US20240212175A1
公开(公告)日:2024-06-27
申请号:US18518465
申请日:2023-11-23
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Chengzhang HE , Yajun HA
CPC classification number: G06T7/32 , G06T7/337 , G06T7/37 , G06T17/00 , G06T2207/20048 , G06T2210/56
Abstract: A global registration method based on spherical harmonic transformation (SHT) and iterative optimization is provided. Two assumptions are provided: firstly, it is predefined that a minimum percentage of a correct matching pair in an input point cloud is represented as a limit on a quantity of outliers in the point cloud, and secondly, a distance threshold used to determine the correct matching pair is preset based on a scenario and represented as a limited distance of an outlier in the point cloud. In the algorithm provided, the point cloud first undergoes coarse registration to obtain a plurality of search domains, and the search domains are sorted based on an evaluation criterion. A branch and bound method is used to exclude an incorrect search domain and obtain a final registration result.
-
18.
公开(公告)号:US20240143883A1
公开(公告)日:2024-05-02
申请号:US18203662
申请日:2023-05-31
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Jianwen LUO , Yajun HA
IPC: G06F30/347 , G06F30/31
CPC classification number: G06F30/347 , G06F30/31
Abstract: A layout method for a scalable multi-die network-on-chip FPGA architecture is provided. An application of the aforementioned layout method for the scalable multi-die network-on-chip FPGA architecture is further provided. A scalable multi-die FPGA architecture based on network-on-chip and a corresponding hierarchical recursive layout algorithm are provided, aiming to directly map a register transfer level dataflow design generated by existing high-level synthesis onto the provided interconnection architecture. The layout method can exploit the potential for hierarchical topology and make more efficient use of dedicated interconnection resources, such as cross-die nets, network-on-chips, and high-speed transceivers.
-
19.
公开(公告)号:US20240127466A1
公开(公告)日:2024-04-18
申请号:US18369884
申请日:2023-09-19
Applicant: SHANGHAITECH UNIVERSITY
IPC: G06T7/521 , G06F18/2135
CPC classification number: G06T7/521 , G06F18/2135 , G06T2207/10028
Abstract: An energy-efficient point cloud feature extraction method based on a field-programmable gate array (FPGA) is mapped onto the FPGA for running. The energy-efficient point cloud feature extraction method based on the FPGA is applied to point cloud feature extraction in unmanned driving; or an intelligent robot. Compared with an existing technical solution, the energy-efficient point cloud feature extraction method based on the FPGA has following innovative points: a low-complexity projection method for organizing unordered and sparse point clouds, a high-parallel method for extracting a coarse-grained feature point, and a high-parallel method for selecting a fine-grained feature point.
-
公开(公告)号:US20230195793A1
公开(公告)日:2023-06-22
申请号:US17799278
申请日:2021-09-22
Applicant: SHANGHAITECH UNIVERSITY
Inventor: Guangyao YAN , Xinzhe LIU , Yajun HA , Hui WANG
IPC: G06F16/901
CPC classification number: G06F16/9024
Abstract: A ripple push method for a graph cut includes: obtaining an excess flow ef(v) of a current node v; traversing four edges connecting the current node v in top, bottom, left and right directions, and determining whether each of the four edges is a pushable edge; calculating, according to different weight functions, a maximum push value of each of the four edges by efw=ef(v)*W, where W denotes a weight function; and traversing the four edges, recording a pushable flow of each of the four edges, and pushing out a calculated flow. The ripple push method explores different push weight functions, and significantly improves the actual parallelism of the push-relabel algorithm.
-
-
-
-
-
-
-
-
-