Abstract:
Techniques herein are for navigation data structures for graph traversal. In an embodiment, navigation data structures that a computer stores include: a source vertex array of vertices; a neighbor array of dense identifiers of target vertices terminating edges; a bidirectional map associating, for each vertex, a sparse identifier of the vertex with a dense identifier of the vertex; and a vertex array containing, when a dense identifier of a source vertex is used as an offset, a pair of offsets defining an offset range, for use with the neighbor array. The source vertex array, using the dense identifier of a particular vertex as an offset, contains an offset, into a neighbor array, of a target vertex terminating an edge originating at the particular vertex. The neighbor array contiguously stores dense identifiers of target vertices terminating edges originating from a same source vertex.
Abstract:
Embodiments perform real-time vertex connectivity checks in graph data representations via a multi-phase search process. This process includes an efficient first search phase using landmark connectivity data that is generated during a preprocessing phase. Landmark connectivity data maps the connectivity of a set of identified landmarks in a graph to other vertices in the graph. Upon determining that the subject vertices are not closely related via landmarks, embodiments implement a second search phase that performs a brute-force search for connectivity, between the subject vertices, among the graph's non-landmark vertices. This brute-force search prevents exploration of cyclical paths by recording the vertices on a currently-explored path in a stack data structure. The second search phase is automatically aborted upon detecting that the non-landmark vertices in the graph are over a threshold density. In this case, embodiments perform a third search phase involving either a modified breadth-first search or modified bidirectional search.
Abstract:
Techniques for identifying common neighbors of two nodes in a graph are provided. One technique involves performing a binary split search and/or a linear search. Another technique involves creating a segmenting index for a first neighbor list. A second neighbor list is scanned and, for each node indicated in the second neighbor list, the segmenting index is used to determine whether the node is also indicated in the first neighbor list. Techniques are also provided for counting the number of triangles. One technique involves pruning nodes from neighbor lists based on the node values of the nodes whose neighbor lists are being pruned. Another technique involves sorting the nodes in a node array (and, thus, their respective neighbor lists) based on the nodes' respective degrees prior to identifying common neighbors. In this way, when pruning the neighbor lists, the neighbor lists of the highly connected nodes are significantly reduced.
Abstract:
Techniques are described herein for automatic generation of multi-source breadth-first search (MS-BFS) from high-level graph processing language. In an embodiment, a method involves a computer analyzing original software instructions. The original software instructions are configured to perform multiple breadth-first searches to determine a particular result. Each breadth-first search originates at each of a subset of vertices of a graph. Each breadth-first search is encoded for independent execution. Based on the analyzing, the computer generates transformed software instructions configured to perform a MS-BFS to determine the particular result. Each of the subset of vertices is a source of the MS-BFS. In an embodiment, parallel execution of the MS-BFS is regulated with batches of vertices. In an embodiment, the original software instructions are expressed in Green-Marl graph analysis language. In an embodiment, the transformed software instructions are expressed in a general purpose programing language such as C, C++, Python, or Java.
Abstract:
Techniques for storing and processing graph data in a database system are provided. Graph data (or a portion thereof) that is stored in persistent storage is loaded into memory to generate an instance of a particular graph. The instance is consistent as of a particular point in time. Graph analysis operations are performed on the instance. The instance may be used by multiple users to perform graph analysis operations. Subsequent changes to the graph are stored separate from the instance. Later, the changes may be applied to the instance (or a copy thereof) to refresh the instance.
Abstract:
Techniques are described for a vectorized queue, which implements a vectorized ‘contains’ function that determines whether a value is in the queue. A three-phase vectorized shortest-path graph search splits each expanding and probing iteration into three phases that utilize vectorized instructions: (1) The neighbors of nodes that are in a next queue are fetched and written into a current queue. (2) It is determined whether the destination node is among the fetched neighbor nodes in the current queue. (3) The fetched neighbor nodes that have not yet been visited are put into the next queue. According to an embodiment, a vectorized copy operation performs vector-based data copying using vectorized load and store instructions. Specifically, vectors of data are copied from a source to a destination. Any invalid data copied to the destination is overwritten, either with a vector of additional valid data or with a vector of nonce data.
Abstract:
Techniques herein minimize memory needed to store distances between vertices of a graph for use during a multi-source breadth-first search (MS-BFS). In an embodiment, during each iteration of a first sequence of iterations of a MS-BFS, a computer updates a first matrix that contains elements that use a first primitive integer type having a first width to record a distance from a source vertex of a graph to another vertex. The computer detects that a count of iterations of the first sequence of iterations exceeds a threshold. Responsively, the computer creates a second matrix that contains elements that use a second primitive integer type having a second width that is larger than the first width to record a distance from a source vertex of the graph to another vertex. During each iteration of a second sequence of iterations of the MS-BFS, the computer updates the second matrix.
Abstract:
Techniques herein minimize memory needed to store distances between vertices of a graph for use during a multi-source breadth-first search (MS-BFS). In an embodiment, during each iteration of a first sequence of iterations of a MS-BFS, a computer updates a first matrix that contains elements that use a first primitive integer type having a first width to record a distance from a source vertex of a graph to another vertex. The computer detects that a count of iterations of the first sequence of iterations exceeds a threshold. Responsively, the computer creates a second matrix that contains elements that use a second primitive integer type having a second width that is larger than the first width to record a distance from a source vertex of the graph to another vertex. During each iteration of a second sequence of iterations of the MS-BFS, the computer updates the second matrix.
Abstract:
Techniques herein are for navigation data structures for graph traversal. In an embodiment, navigation data structures that a computer stores include: a source vertex array of vertices; a neighbor array of dense identifiers of target vertices terminating edges; a bidirectional map associating, for each vertex, a sparse identifier of the vertex with a dense identifier of the vertex; and a vertex array containing, when a dense identifier of a source vertex is used as an offset, a pair of offsets defining an offset range, for use with the neighbor array. The source vertex array, using the dense identifier of a particular vertex as an offset, contains an offset, into a neighbor array, of a target vertex terminating an edge originating at the particular vertex. The neighbor array contiguously stores dense identifiers of target vertices terminating edges originating from a same source vertex.
Abstract:
Techniques herein generate, such as during compilation, polymorphic dispatch logic (PDL) to switch between specialized implementations of a polymorphic graph algorithm. In an embodiment, a computer detects, within source logic of a graph algorithm, that the algorithm processes an instance of a generic graph type. The computer generates several alternative implementations of the algorithm. Each implementation is specialized to process the graph instance as an instance of a respective graph subtype. The computer generates PDL that performs dynamic dispatch as follows. At runtime, the PDL receives a graph instance of the generic graph type. The PDL detects which particular graph subtype is the graph instance. The PDL then invokes whichever alternative implementation that is specialized to process the graph instance as an instance of the detected particular graph subtype. In embodiments, the source logic is expressed in a domain specific language (DSL), e.g. for analysis, traversal, or querying of graphs.