-
公开(公告)号:US20210081347A1
公开(公告)日:2021-03-18
申请号:US16896464
申请日:2020-06-09
Inventor: Xiaofei LIAO , Fan ZHANG , Long ZHENG , Hai JIN , Zhiyuan SHAO
Abstract: A graph processing optimization method that addresses the problems such as the low computation-to-communication ratio in graph environments, and high communication overhead as well as load imbalance in heterogeneous environments for graph processing. The method reduces communication overhead between accelerators by optimizing graph partitioning so as to improve system scalability.
-
公开(公告)号:US20240330369A1
公开(公告)日:2024-10-03
申请号:US18610495
申请日:2024-03-20
Inventor: Long ZHENG , Haiheng HE , Xiaofei LIAO , Hai JIN , Dan CHEN , Yu HUANG
IPC: G06F16/901 , G06F40/30
CPC classification number: G06F16/9024 , G06F40/30
Abstract: A method for incremental metapath storage and dynamic maintenance is provided, which includes, reformatting metapath instances, from a designated heterogeneous graph and of a designated metapath type, into path graphs; executing graph updating tasks and performing dynamic maintenance on the updated path graphs, traversing the path graph to obtain the location of metapath updates and update the path graph; for metapaths with length greater than 2 and with symmetrical central portion, central merge operation is performed to simplify path graph and perform subsequent restoration operation; and directly perform restoration operation on path graphs that do not meet the merging conditions. The present disclosure utilizes characteristics of graph update to obtain locality of metapath updates, and combines internal relationship characteristics of metapath instances to greatly speed up metapath generation and achieve real-time inference of dynamic heterogeneous graph models.
-
公开(公告)号:US20200242072A1
公开(公告)日:2020-07-30
申请号:US16722082
申请日:2019-12-20
Inventor: Xiaofei LIAO , Hai JIN , Long ZHENG , Chengbo YANG
IPC: G06F15/76 , G06F16/901
Abstract: An FPGA-based graph data processing method is provided for executing graph traversals on a graph having characteristics of a small-world network by using a first processor being a CPU and a second processor that is a FPGA and is in communicative connection with the first processor, wherein the first processor sends graph data to be traversed to the second processor, and obtains result data of the graph traversals from the second processor for result output after the second processor has completed the graph traversals of the graph data by executing level traversals, and the second processor comprises a sparsity processing module and a density processing module, the sparsity processing module operates in a beginning stage and/or an ending stage of the graph traversals, and the density processing module with a higher degree of parallelism than the sparsity processing module operates in the intermediate stage of the graph traversals.
-
公开(公告)号:US20240220541A1
公开(公告)日:2024-07-04
申请号:US18497233
申请日:2023-10-30
Inventor: Long ZHENG , Chaoqiang LIU , Xiaofei LIAO , Hai JIN , Yu HUANG , Zhaozeng AN
IPC: G06F16/901
CPC classification number: G06F16/9024
Abstract: An FPGA-based method and system for accelerating graph construction is provided, the method including: sampling neighborhood of each vertex in stored data and recording a traversal order of the vertices; according to the vertex traversal order, grouping the vertices into blocks and processing them by block-granularity, so as to at least obtain distance values between each two sampled neighbors of vertices in each block; according to the said distance values, updating neighborhoods of the two relevant vertices; and processing all of the blocks, starting a new iteration, until a satisfying precision or a predetermined limit of the number of iterations has been reached. The present disclosure utilizes the advantages of FPGA platform including flexibility, low power consumption and high parallelism, combined with the characteristics of graph construction algorithm, thereby greatly improving construction speed and reducing processing power consumption, so as to enable large-scale graph construction task processing in the datacenter.
-
5.
公开(公告)号:US20230367815A1
公开(公告)日:2023-11-16
申请号:US17945792
申请日:2022-09-15
Inventor: Yu ZHANG , Jin ZHAO , Qiange SHEN , Xinyu JIANG , Hui YU , Hao QI , Yun YANG , Shijun LI , Xiaofei LIAO , Hai JIN
IPC: G06F16/901
CPC classification number: G06F16/9024
Abstract: The present invention relates to energy-efficient collaborative method and apparatus for graph processing, wherein the apparatus comprises at least: a dependency path prefetching unit for receiving active vertex information and prefetching an edge of graph partition along a dependency path, starting with an active vertex in a circular queue; and a direct dependency managing unit for converting dependency relationship between head and tail vertices of a core dependency path into direct dependency and managing it in a cache, and updating dependency indexes according to dynamic changes in graph structure during dynamic graph processing, so as to ensure accurate results of graph processing. The accelerator of the present invention is capable of being integrated into a multi-core processor, thereby processing multiple paths on multiple processor kernels with high concurrency, and in turn accelerating dissemination of vertex states in a graph to speed convergence during graph processing.
-
6.
公开(公告)号:US20200272907A1
公开(公告)日:2020-08-27
申请号:US16748284
申请日:2020-01-21
Inventor: Hai JIN , Xiaofei LIAO , Long ZHENG , Haikun LIU , Xi GE
Abstract: A deep learning heterogeneous computing method based on layer-wide memory allocation, at least comprises steps of: traversing a neural network model so as to acquire a training operational sequence and a number of layers L thereof; calculating a memory room R1 required by data involved in operation at the ith layer of the neural network model under a double-buffer configuration, where 1≤i≤L; altering a layer structure of the ith layer and updating the training operational sequence; distributing all the data across a memory room of the CPU and the memory room of the GPU according to a data placement method; performing iterative computation at each said layer successively based on the training operational sequence so as to complete neural network training.
-
公开(公告)号:US20240061779A1
公开(公告)日:2024-02-22
申请号:US18145565
申请日:2022-12-22
Inventor: Long ZHENG , Qinggang WANG , Xiaofei LIAO , Ao HU , Hai JIN
IPC: G06F12/0806 , G06F12/10
CPC classification number: G06F12/0806 , G06F12/10 , G06F2212/1016
Abstract: The present invention relates to a hardware accelerator for hypergraph processing and its operating method, the hardware accelerator comprising: a data loader: for, in the presence of a data-centric load-trigger-reduce execution model, reading hypergraph partition data from an off-chip memory successively according to hypergraph data structure and an order of hypergraph partitions; an address translator, for deploying the hypergraph data into a private register of a processor and/or into a buffer memory according to a priority level of loaded data, and recording corresponding offset information; a task trigger, for generating computing tasks according to the loaded data, and scheduling the computing tasks into the processor; the processor, for receiving and executing the computing tasks; a reducer, for scheduling intermediate results into a first-priority-data reducer unit or a second-priority-data reducer unit depending on the priority level of the data so as to execute a reducing operation for the intermediate results. In view of the shortcomings of task-centric hardware accelerators, the present invention can prevent any possible data conflict during parallel execution of multiple processing units.
-
公开(公告)号:US20240053892A1
公开(公告)日:2024-02-15
申请号:US18145552
申请日:2022-12-22
Inventor: Long ZHENG , Qinggang WANG , Xiaofei LIAO , Zhaozeng AN , Hai JIN
IPC: G06F3/06
CPC classification number: G06F3/061 , G06F3/0673 , G06F3/0656
Abstract: The present invention relates to a dynamic memory management apparatus and method for HLS, the apparatus at least comprising: several searching and caching modules and several modifying and writing-back modules, wherein the searching and caching modules are in connection with a DRAM storing module and a BRAM buffer, respectively, and the modifying and writing-back modules are in connection with the DRAM storing module and the BRAM buffer, respectively, wherein the BRAM buffer is for caching information about nodes on a search path and registering information about modification made to the nodes; the searching and caching module is for reading node data from the DRAM storing module according to received operators and node addresses, and writing the node data into the BRAM buffer; and the modifying and writing-back module reads the node data from the BRAM buffer and writes the node data back into the DRAM storing module. Aiming at the defect that the traditional operating system is directly transplanted to the FPGA and has low execution efficiency, the present invention utilizes the advantage of the large capacity of the DRAM on the FPGA to realize efficient dynamic memory allocation and deallocation, and improve the usability and code reusability of HLS.
-
公开(公告)号:US20210191763A1
公开(公告)日:2021-06-24
申请号:US16947055
申请日:2020-07-16
Inventor: Xiaofei LIAO , Yicheng Chen , Yu Zhang , Hai Jin , Jin Zhao , Xiang Zhao , Beibei Si
IPC: G06F9/48 , G06F30/331 , G06F9/30 , G06F9/38 , G06F9/50
Abstract: The present disclosure relates to an FPGA-based dynamic graph processing method, comprising: where graph mirrors of a dynamic graph that have successive timestamps define an increment therebetween, a pre-processing module dividing the graph mirror having the latter timestamp into at least one path unit in a manner that incremental computing for any vertex only depends on a preorder vertex of that vertex; an FPGA processing module storing at least two said path units into an on-chip memory directly linked to threads in a manner that every thread unit is able to process the path unit independently; the thread unit determining an increment value between the successive timestamps of the preorder vertex while updating a state value of the preorder vertex, and transferring the increment value to a succeeding vertex adjacent to the preorder vertex in a transfer direction determined by the path unit, so as to update the state value of the succeeding vertex.
-
10.
公开(公告)号:US20210182200A1
公开(公告)日:2021-06-17
申请号:US16933357
申请日:2020-07-20
Inventor: Xiaofei LIAO , Yu HUANG , Long ZHENG , Hai JIN
IPC: G06F12/0862
Abstract: The present invention relates to a graph-computing-oriented heterogeneous in-memory computing apparatus, comprising a memory control unit, a digital signal processing unit, and a plurality of analog signal processing units using the memory control unit.
-
-
-
-
-
-
-
-
-