-
公开(公告)号:US20210081347A1
公开(公告)日:2021-03-18
申请号:US16896464
申请日:2020-06-09
Inventor: Xiaofei LIAO , Fan ZHANG , Long ZHENG , Hai JIN , Zhiyuan SHAO
Abstract: A graph processing optimization method that addresses the problems such as the low computation-to-communication ratio in graph environments, and high communication overhead as well as load imbalance in heterogeneous environments for graph processing. The method reduces communication overhead between accelerators by optimizing graph partitioning so as to improve system scalability.
-
公开(公告)号:US20240330369A1
公开(公告)日:2024-10-03
申请号:US18610495
申请日:2024-03-20
Inventor: Long ZHENG , Haiheng HE , Xiaofei LIAO , Hai JIN , Dan CHEN , Yu HUANG
IPC: G06F16/901 , G06F40/30
CPC classification number: G06F16/9024 , G06F40/30
Abstract: A method for incremental metapath storage and dynamic maintenance is provided, which includes, reformatting metapath instances, from a designated heterogeneous graph and of a designated metapath type, into path graphs; executing graph updating tasks and performing dynamic maintenance on the updated path graphs, traversing the path graph to obtain the location of metapath updates and update the path graph; for metapaths with length greater than 2 and with symmetrical central portion, central merge operation is performed to simplify path graph and perform subsequent restoration operation; and directly perform restoration operation on path graphs that do not meet the merging conditions. The present disclosure utilizes characteristics of graph update to obtain locality of metapath updates, and combines internal relationship characteristics of metapath instances to greatly speed up metapath generation and achieve real-time inference of dynamic heterogeneous graph models.
-
公开(公告)号:US20200242072A1
公开(公告)日:2020-07-30
申请号:US16722082
申请日:2019-12-20
Inventor: Xiaofei LIAO , Hai JIN , Long ZHENG , Chengbo YANG
IPC: G06F15/76 , G06F16/901
Abstract: An FPGA-based graph data processing method is provided for executing graph traversals on a graph having characteristics of a small-world network by using a first processor being a CPU and a second processor that is a FPGA and is in communicative connection with the first processor, wherein the first processor sends graph data to be traversed to the second processor, and obtains result data of the graph traversals from the second processor for result output after the second processor has completed the graph traversals of the graph data by executing level traversals, and the second processor comprises a sparsity processing module and a density processing module, the sparsity processing module operates in a beginning stage and/or an ending stage of the graph traversals, and the density processing module with a higher degree of parallelism than the sparsity processing module operates in the intermediate stage of the graph traversals.
-
公开(公告)号:US20240220541A1
公开(公告)日:2024-07-04
申请号:US18497233
申请日:2023-10-30
Inventor: Long ZHENG , Chaoqiang LIU , Xiaofei LIAO , Hai JIN , Yu HUANG , Zhaozeng AN
IPC: G06F16/901
CPC classification number: G06F16/9024
Abstract: An FPGA-based method and system for accelerating graph construction is provided, the method including: sampling neighborhood of each vertex in stored data and recording a traversal order of the vertices; according to the vertex traversal order, grouping the vertices into blocks and processing them by block-granularity, so as to at least obtain distance values between each two sampled neighbors of vertices in each block; according to the said distance values, updating neighborhoods of the two relevant vertices; and processing all of the blocks, starting a new iteration, until a satisfying precision or a predetermined limit of the number of iterations has been reached. The present disclosure utilizes the advantages of FPGA platform including flexibility, low power consumption and high parallelism, combined with the characteristics of graph construction algorithm, thereby greatly improving construction speed and reducing processing power consumption, so as to enable large-scale graph construction task processing in the datacenter.
-
5.
公开(公告)号:US20200272907A1
公开(公告)日:2020-08-27
申请号:US16748284
申请日:2020-01-21
Inventor: Hai JIN , Xiaofei LIAO , Long ZHENG , Haikun LIU , Xi GE
Abstract: A deep learning heterogeneous computing method based on layer-wide memory allocation, at least comprises steps of: traversing a neural network model so as to acquire a training operational sequence and a number of layers L thereof; calculating a memory room R1 required by data involved in operation at the ith layer of the neural network model under a double-buffer configuration, where 1≤i≤L; altering a layer structure of the ith layer and updating the training operational sequence; distributing all the data across a memory room of the CPU and the memory room of the GPU according to a data placement method; performing iterative computation at each said layer successively based on the training operational sequence so as to complete neural network training.
-
公开(公告)号:US20240061779A1
公开(公告)日:2024-02-22
申请号:US18145565
申请日:2022-12-22
Inventor: Long ZHENG , Qinggang WANG , Xiaofei LIAO , Ao HU , Hai JIN
IPC: G06F12/0806 , G06F12/10
CPC classification number: G06F12/0806 , G06F12/10 , G06F2212/1016
Abstract: The present invention relates to a hardware accelerator for hypergraph processing and its operating method, the hardware accelerator comprising: a data loader: for, in the presence of a data-centric load-trigger-reduce execution model, reading hypergraph partition data from an off-chip memory successively according to hypergraph data structure and an order of hypergraph partitions; an address translator, for deploying the hypergraph data into a private register of a processor and/or into a buffer memory according to a priority level of loaded data, and recording corresponding offset information; a task trigger, for generating computing tasks according to the loaded data, and scheduling the computing tasks into the processor; the processor, for receiving and executing the computing tasks; a reducer, for scheduling intermediate results into a first-priority-data reducer unit or a second-priority-data reducer unit depending on the priority level of the data so as to execute a reducing operation for the intermediate results. In view of the shortcomings of task-centric hardware accelerators, the present invention can prevent any possible data conflict during parallel execution of multiple processing units.
-
公开(公告)号:US20240053892A1
公开(公告)日:2024-02-15
申请号:US18145552
申请日:2022-12-22
Inventor: Long ZHENG , Qinggang WANG , Xiaofei LIAO , Zhaozeng AN , Hai JIN
IPC: G06F3/06
CPC classification number: G06F3/061 , G06F3/0673 , G06F3/0656
Abstract: The present invention relates to a dynamic memory management apparatus and method for HLS, the apparatus at least comprising: several searching and caching modules and several modifying and writing-back modules, wherein the searching and caching modules are in connection with a DRAM storing module and a BRAM buffer, respectively, and the modifying and writing-back modules are in connection with the DRAM storing module and the BRAM buffer, respectively, wherein the BRAM buffer is for caching information about nodes on a search path and registering information about modification made to the nodes; the searching and caching module is for reading node data from the DRAM storing module according to received operators and node addresses, and writing the node data into the BRAM buffer; and the modifying and writing-back module reads the node data from the BRAM buffer and writes the node data back into the DRAM storing module. Aiming at the defect that the traditional operating system is directly transplanted to the FPGA and has low execution efficiency, the present invention utilizes the advantage of the large capacity of the DRAM on the FPGA to realize efficient dynamic memory allocation and deallocation, and improve the usability and code reusability of HLS.
-
8.
公开(公告)号:US20210182200A1
公开(公告)日:2021-06-17
申请号:US16933357
申请日:2020-07-20
Inventor: Xiaofei LIAO , Yu HUANG , Long ZHENG , Hai JIN
IPC: G06F12/0862
Abstract: The present invention relates to a graph-computing-oriented heterogeneous in-memory computing apparatus, comprising a memory control unit, a digital signal processing unit, and a plurality of analog signal processing units using the memory control unit.
-
-
-
-
-
-
-