Efficient method for indexing data transferred between machines in distributed graph processing systems

    公开(公告)号:US10002205B2

    公开(公告)日:2018-06-19

    申请号:US14947382

    申请日:2015-11-20

    CPC classification number: G06F16/9024 G06F16/278

    Abstract: Techniques herein index data transferred during distributed graph processing. In an embodiment, a system of computers divides a directed graph into partitions. The system creates one partition per computer and distributes each partition to a computer. Each computer builds four edge lists that enumerate edges that connect the partition of the computer with a partition of a neighbor computer. Each of the four edge lists has edges of a direction, which may be inbound or outbound from the partition. Edge lists are sorted by identifier of the vertex that terminates or originates each edge. Each iteration of distributed graph analysis involves each computer processing its partition and exchanging edge data or vertex data with neighbor computers. Each computer uses an edge list to build a compactly described range of edges that connect to another partition. The computers exchange described ranges with their neighbors during each iteration.

    DISTRIBUTED GRAPH PROCESSING SYSTEM FEATURING INTERACTIVE REMOTE CONTROL MECHANISM INCLUDING TASK CANCELLATION

    公开(公告)号:US20190205178A1

    公开(公告)日:2019-07-04

    申请号:US16353050

    申请日:2019-03-14

    CPC classification number: G06F9/5066 G06F9/546

    Abstract: Techniques herein provide job control and synchronization of distributed graph-processing jobs. In an embodiment, a computer system maintains an input queue of graph processing jobs. In response to de-queuing a graph processing job, a master thread partitions the graph processing job into distributed jobs. Each distributed job has a sequence of processing phases. The master thread sends each distributed job to a distributed processor. Each distributed job executes a first processing phase of its sequence of processing phases. To the master thread, the distributed job announces completion of its first processing phase. The master thread detects that all distributed jobs have announced finishing their first processing phase. The master thread broadcasts a notification to the distributed jobs that indicates that all distributed jobs have finished their first processing phase. Receiving that notification causes the distributed jobs to execute their second processing phase. Queues and barriers provide for faults and cancellation.

    CONCURRENT DISTRIBUTED GRAPH PROCESSING SYSTEM WITH SELF-BALANCE

    公开(公告)号:US20190171490A1

    公开(公告)日:2019-06-06

    申请号:US16270135

    申请日:2019-02-07

    Abstract: Techniques are provided for dynamically self-balancing communication and computation. In an embodiment, each partition of application data is stored on a respective computer of a cluster. The application is divided into distributed jobs, each of which corresponds to a partition. Each distributed job is hosted on the computer that hosts the corresponding data partition. Each computer divides its distributed job into computation tasks. Each computer has a pool of threads that execute the computation tasks. During execution, one computer receives a data access request from another computer. The data access request is executed by a thread of the pool. Threads of the pool are bimodal and may be repurposed between communication and computation, depending on workload. Each computer individually detects completion of its computation tasks. Each computer informs a central computer that its distributed job has finished. The central computer detects when all distributed jobs of the application have terminated.

    CONCURRENT DISTRIBUTED GRAPH PROCESSING SYSTEM WITH SELF-BALANCE

    公开(公告)号:US20170351551A1

    公开(公告)日:2017-12-07

    申请号:US15175920

    申请日:2016-06-07

    Abstract: Techniques are provided for dynamically self-balancing communication and computation. In an embodiment, each partition of application data is stored on a respective computer of a cluster. The application is divided into distributed jobs, each of which corresponds to a partition. Each distributed job is hosted on the computer that hosts the corresponding data partition. Each computer divides its distributed job into computation tasks. Each computer has a pool of threads that execute the computation tasks. During execution, one computer receives a data access request from another computer. The data access request is executed by a thread of the pool. Threads of the pool are bimodal and may be repurposed between communication and computation, depending on workload. Each computer individually detects completion of its computation tasks. Each computer informs a central computer that its distributed job has finished. The central computer detects when all distributed jobs of the application have terminated.

    EFFICIENT METHOD FOR INDEXING DATA TRANSFERRED BETWEEN MACHINES IN DISTRIBUTED GRAPH PROCESSING SYSTEMS

    公开(公告)号:US20170147706A1

    公开(公告)日:2017-05-25

    申请号:US14947382

    申请日:2015-11-20

    CPC classification number: G06F17/30958 G06F17/30584

    Abstract: Techniques herein index data transferred during distributed graph processing. In an embodiment, a system of computers divides a directed graph into partitions. The system creates one partition per computer and distributes each partition to a computer. Each computer builds four edge lists that enumerate edges that connect the partition of the computer with a partition of a neighbor computer. Each of the four edge lists has edges of a direction, which may be inbound or outbound from the partition. Edge lists are sorted by identifier of the vertex that terminates or originates each edge. Each iteration of distributed graph analysis involves each computer processing its partition and exchanging edge data or vertex data with neighbor computers. Each computer uses an edge list to build a compactly described range of edges that connect to another partition. The computers exchange described ranges with their neighbors during each iteration.

    Latency-hiding context management for concurrent distributed tasks in a distributed system
    8.
    发明授权
    Latency-hiding context management for concurrent distributed tasks in a distributed system 有权
    分布式系统中并发分布式任务的延迟隐藏上下文管理

    公开(公告)号:US09535756B2

    公开(公告)日:2017-01-03

    申请号:US14619414

    申请日:2015-02-11

    CPC classification number: G06F9/5016 G06F9/546 G06F9/547 G06F2209/548

    Abstract: Techniques are provided for latency-hiding context management for concurrent distributed tasks. A plurality of task objects is processed, including a first task object corresponding to a first task that includes access to first data residing on a remote machine. A first access request is added to a request buffer. A first task reference identifying the first task object is added to a companion buffer. A request message including the request buffer is sent to the remote machine. A response message is received, including first response data responsive to the first access request. For each response of one or more responses of the response message, the response is read from the response message, a next task reference is read from the companion buffer, and a next task corresponding to the next task reference is continued based on the response. The first task is identified and continued.

    Abstract translation: 为并发分布式任务的延迟隐藏上下文管理提供了技术。 处理多个任务对象,包括对应于第一任务的第一任务对象,其包括访问驻留在远程机器上的第一数据。 第一个访问请求被添加到请求缓冲区。 将标识第一个任务对象的第一个任务引用添加到协同缓冲区。 包含请求缓冲区的请求消息被发送到远程机器。 接收响应消息,包括响应于第一接入请求的第一响应数据。 对于响应消息的一个或多个响应的每个响应,从响应消息中读取响应,从伴随缓冲器读取下一个任务引用,并且基于响应继续下一个与下一个任务引用相对应的任务。 第一个任务被确认并继续。

    LATENCY-HIDING CONTEXT MANAGEMENT FOR CONCURRENT DISTRIBUTED TASKS
    9.
    发明申请
    LATENCY-HIDING CONTEXT MANAGEMENT FOR CONCURRENT DISTRIBUTED TASKS 有权
    用于同时分配的任务的隐藏背景管理

    公开(公告)号:US20160232037A1

    公开(公告)日:2016-08-11

    申请号:US14619414

    申请日:2015-02-11

    CPC classification number: G06F9/5016 G06F9/546 G06F9/547 G06F2209/548

    Abstract: Techniques are provided for latency-hiding context management for concurrent distributed tasks. A plurality of task objects is processed, including a first task object corresponding to a first task that includes access to first data residing on a remote machine. A first access request is added to a request buffer. A first task reference identifying the first task object is added to a companion buffer. A request message including the request buffer is sent to the remote machine. A response message is received, including first response data responsive to the first access request. For each response of one or more responses of the response message, the response is read from the response message, a next task reference is read from the companion buffer, and a next task corresponding to the next task reference is continued based on the response. The first task is identified and continued.

    Abstract translation: 为并发分布式任务的延迟隐藏上下文管理提供了技术。 处理多个任务对象,包括对应于第一任务的第一任务对象,其包括访问驻留在远程机器上的第一数据。 第一个访问请求被添加到请求缓冲区。 将标识第一个任务对象的第一个任务引用添加到协同缓冲区。 包含请求缓冲区的请求消息被发送到远程机器。 接收响应消息,包括响应于第一接入请求的第一响应数据。 对于响应消息的一个或多个响应的每个响应,从响应消息中读取响应,从伴随缓冲器读取下一个任务引用,并且基于响应继续下一个与下一个任务引用相对应的任务。 第一个任务被确认并继续。

    Concurrent distributed graph processing system with self-balance

    公开(公告)号:US11030014B2

    公开(公告)日:2021-06-08

    申请号:US16270135

    申请日:2019-02-07

    Abstract: Techniques are provided for dynamically self-balancing communication and computation. In an embodiment, each partition of application data is stored on a respective computer of a cluster. The application is divided into distributed jobs, each of which corresponds to a partition. Each distributed job is hosted on the computer that hosts the corresponding data partition. Each computer divides its distributed job into computation tasks. Each computer has a pool of threads that execute the computation tasks. During execution, one computer receives a data access request from another computer. The data access request is executed by a thread of the pool. Threads of the pool are bimodal and may be repurposed between communication and computation, depending on workload. Each computer individually detects completion of its computation tasks. Each computer informs a central computer that its distributed job has finished. The central computer detects when all distributed jobs of the application have terminated.

Patent Agency Ranking