-
公开(公告)号:US20190102154A1
公开(公告)日:2019-04-04
申请号:US15721848
申请日:2017-09-30
Applicant: Oracle International Corporation
Inventor: Petr Koupy , Thomas Manhardt , Siegfried Depner , Sungpack Hong , Hassan Chafi
Abstract: Techniques herein minimally communicate between computers to repartition a graph. In embodiments, each computer receives a partition of edges and vertices of the graph. For each of its edges or vertices, each computer stores an intermediate representation into an edge table (ET) or vertex table. Different edges of a vertex may be loaded by different computers, which may cause a conflict. Each computer announces that a vertex resides on the computer to a respective tracking computer. Each tracking computer makes assignments of vertices to computers and publicizes those assignments. Each computer that loaded conflicted vertices transfers those vertices to computers of the respective assignments. Each computer stores a materialized representation of a partition based on: the ET and vertex table of the computer, and the vertices and edges that were transferred to the computer. Edges stored in the materialized representation are stored differently than edges stored in the ET.
-
公开(公告)号:US20170147706A1
公开(公告)日:2017-05-25
申请号:US14947382
申请日:2015-11-20
Applicant: Oracle International Corporation
Inventor: Nicholas Roth , Sungpack Hong , Siegfried Depner , Thomas Manhardt , Hassan Chafi
IPC: G06F17/30
CPC classification number: G06F17/30958 , G06F17/30584
Abstract: Techniques herein index data transferred during distributed graph processing. In an embodiment, a system of computers divides a directed graph into partitions. The system creates one partition per computer and distributes each partition to a computer. Each computer builds four edge lists that enumerate edges that connect the partition of the computer with a partition of a neighbor computer. Each of the four edge lists has edges of a direction, which may be inbound or outbound from the partition. Edge lists are sorted by identifier of the vertex that terminates or originates each edge. Each iteration of distributed graph analysis involves each computer processing its partition and exchanging edge data or vertex data with neighbor computers. Each computer uses an edge list to build a compactly described range of edges that connect to another partition. The computers exchange described ranges with their neighbors during each iteration.
-
公开(公告)号:US11363093B2
公开(公告)日:2022-06-14
申请号:US15968637
申请日:2018-05-01
Applicant: Oracle International Corporation
Inventor: Jinsu Lee , Thomas Manhardt , Sungpack Hong , Petr Koupy , Hassan Chafi , Vasileios Trigonakis
IPC: H04L29/08 , G06F16/901 , H04L67/10
Abstract: Techniques are described herein for evaluating graph processing tasks using a multi-stage pipelining communication mechanism. In a multi-node system comprising a plurality of nodes, each node of said plurality of nodes executes a respective communication agent object. The respective communication agent object comprises: a sender lambda function is configured to perform sending operations and generate source messages based on the sender operations. An intermediate lambda function is configured to read source messages marked for a node, perform intermediate operations based on the source messages and generate intermediate messages based on the intermediate operations. A final receiver lambda function configured to: read intermediate messages marked for said each node, perform final operations based on the intermediate messages and generate a final result based on the final operations.
-
公开(公告)号:US20210042102A1
公开(公告)日:2021-02-11
申请号:US17069104
申请日:2020-10-13
Applicant: Oracle International Corporation
Inventor: Petr Koupy , Thomas Manhardt , Siegfried Depner , Sungpack Hong , Hassan Chafi
Abstract: Techniques herein minimally communicate between computers to repartition a graph. In embodiments, each computer receives a partition of edges and vertices of the graph. For each of its edges or vertices, each computer stores an intermediate representation into an edge table (ET) or vertex table. Different edges of a vertex may be loaded by different computers, which may cause a conflict. Each computer announces that a vertex resides on the computer to a respective tracking computer. Each tracking computer makes assignments of vertices to computers and publicizes those assignments. Each computer that loaded conflicted vertices transfers those vertices to computers of the respective assignments. Each computer stores a materialized representation of a partition based on: the ET and vertex table of the computer, and the vertices and edges that were transferred to the computer. Edges stored in the materialized representation are stored differently than edges stored in the ET.
-
公开(公告)号:US10002205B2
公开(公告)日:2018-06-19
申请号:US14947382
申请日:2015-11-20
Applicant: Oracle International Corporation
Inventor: Nicholas Roth , Sungpack Hong , Siegfried Depner , Thomas Manhardt , Hassan Chafi
IPC: G06F17/30
CPC classification number: G06F16/9024 , G06F16/278
Abstract: Techniques herein index data transferred during distributed graph processing. In an embodiment, a system of computers divides a directed graph into partitions. The system creates one partition per computer and distributes each partition to a computer. Each computer builds four edge lists that enumerate edges that connect the partition of the computer with a partition of a neighbor computer. Each of the four edge lists has edges of a direction, which may be inbound or outbound from the partition. Edge lists are sorted by identifier of the vertex that terminates or originates each edge. Each iteration of distributed graph analysis involves each computer processing its partition and exchanging edge data or vertex data with neighbor computers. Each computer uses an edge list to build a compactly described range of edges that connect to another partition. The computers exchange described ranges with their neighbors during each iteration.
-
6.
公开(公告)号:US20190205178A1
公开(公告)日:2019-07-04
申请号:US16353050
申请日:2019-03-14
Applicant: ORACLE INTERNATIONAL CORPORATION
Inventor: Jinsu Lee , Sungpack Hong , Siegfried Depner , Nicholas Roth , Thomas Manhardt , Hassan Chafi
CPC classification number: G06F9/5066 , G06F9/546
Abstract: Techniques herein provide job control and synchronization of distributed graph-processing jobs. In an embodiment, a computer system maintains an input queue of graph processing jobs. In response to de-queuing a graph processing job, a master thread partitions the graph processing job into distributed jobs. Each distributed job has a sequence of processing phases. The master thread sends each distributed job to a distributed processor. Each distributed job executes a first processing phase of its sequence of processing phases. To the master thread, the distributed job announces completion of its first processing phase. The master thread detects that all distributed jobs have announced finishing their first processing phase. The master thread broadcasts a notification to the distributed jobs that indicates that all distributed jobs have finished their first processing phase. Receiving that notification causes the distributed jobs to execute their second processing phase. Queues and barriers provide for faults and cancellation.
-
公开(公告)号:US20190171490A1
公开(公告)日:2019-06-06
申请号:US16270135
申请日:2019-02-07
Applicant: Oracle International Corporation
Inventor: Thomas Manhardt , Sungpack Hong , Siegfried Depner , Jinsu Lee , Nicholas Roth , Hassan Chafi
Abstract: Techniques are provided for dynamically self-balancing communication and computation. In an embodiment, each partition of application data is stored on a respective computer of a cluster. The application is divided into distributed jobs, each of which corresponds to a partition. Each distributed job is hosted on the computer that hosts the corresponding data partition. Each computer divides its distributed job into computation tasks. Each computer has a pool of threads that execute the computation tasks. During execution, one computer receives a data access request from another computer. The data access request is executed by a thread of the pool. Threads of the pool are bimodal and may be repurposed between communication and computation, depending on workload. Each computer individually detects completion of its computation tasks. Each computer informs a central computer that its distributed job has finished. The central computer detects when all distributed jobs of the application have terminated.
-
公开(公告)号:US20170351551A1
公开(公告)日:2017-12-07
申请号:US15175920
申请日:2016-06-07
Applicant: Oracle International Corporation
Inventor: Thomas Manhardt , Sungpack Hong , Siegfried Depner , Jinsu Lee , Nicholas Roth , Hassan Chafi
IPC: G06F9/50
CPC classification number: G06F9/5061 , G06F9/5038 , G06F9/5066 , G06F9/5083 , G06F9/522 , H04L67/10
Abstract: Techniques are provided for dynamically self-balancing communication and computation. In an embodiment, each partition of application data is stored on a respective computer of a cluster. The application is divided into distributed jobs, each of which corresponds to a partition. Each distributed job is hosted on the computer that hosts the corresponding data partition. Each computer divides its distributed job into computation tasks. Each computer has a pool of threads that execute the computation tasks. During execution, one computer receives a data access request from another computer. The data access request is executed by a thread of the pool. Threads of the pool are bimodal and may be repurposed between communication and computation, depending on workload. Each computer individually detects completion of its computation tasks. Each computer informs a central computer that its distributed job has finished. The central computer detects when all distributed jobs of the application have terminated.
-
公开(公告)号:US11030014B2
公开(公告)日:2021-06-08
申请号:US16270135
申请日:2019-02-07
Applicant: Oracle International Corporation
Inventor: Thomas Manhardt , Sungpack Hong , Siegfried Depner , Jinsu Lee , Nicholas Roth , Hassan Chafi
IPC: G06F9/50 , H04L29/08 , G06F9/52 , G06F16/901 , G06F11/34
Abstract: Techniques are provided for dynamically self-balancing communication and computation. In an embodiment, each partition of application data is stored on a respective computer of a cluster. The application is divided into distributed jobs, each of which corresponds to a partition. Each distributed job is hosted on the computer that hosts the corresponding data partition. Each computer divides its distributed job into computation tasks. Each computer has a pool of threads that execute the computation tasks. During execution, one computer receives a data access request from another computer. The data access request is executed by a thread of the pool. Threads of the pool are bimodal and may be repurposed between communication and computation, depending on workload. Each computer individually detects completion of its computation tasks. Each computer informs a central computer that its distributed job has finished. The central computer detects when all distributed jobs of the application have terminated.
-
公开(公告)号:US10754700B2
公开(公告)日:2020-08-25
申请号:US16353050
申请日:2019-03-14
Applicant: ORACLE INTERNATIONAL CORPORATION
Inventor: Jinsu Lee , Sungpack Hong , Siegfried Depner , Nicholas Roth , Thomas Manhardt , Hassan Chafi
Abstract: Techniques herein provide job control and synchronization of distributed graph-processing jobs. In an embodiment, a computer system maintains an input queue of graph processing jobs. In response to de-queuing a graph processing job, a master thread partitions the graph processing job into distributed jobs. Each distributed job has a sequence of processing phases. The master thread sends each distributed job to a distributed processor. Each distributed job executes a first processing phase of its sequence of processing phases. To the master thread, the distributed job announces completion of its first processing phase. The master thread detects that all distributed jobs have announced finishing their first processing phase. The master thread broadcasts a notification to the distributed jobs that indicates that all distributed jobs have finished their first processing phase. Receiving that notification causes the distributed jobs to execute their second processing phase. Queues and barriers provide for faults and cancellation.
-
-
-
-
-
-
-
-
-