Abstract:
Techniques support graph pattern matching queries inside a relational database management system (RDBMS) that supports SQL execution. The techniques compile a graph pattern matching query that includes a bounded recursive pattern query into a SQL query that can then be executed by the relational engine. As a result, techniques enable execution of graph pattern matching queries that include bounded recursive patterns on top of the relational engine by avoiding any change in the existing SQL engine.
Abstract:
Techniques are described herein for early pruning of potential graph query results. Specifically, based on determining that property values of a path through graph data cannot affect results of a query, the path is pruned from a set of potential query solutions prior to fully exploring the path. Early solution pruning is performed on prunable queries that project prunable functions including MIN, MAX, SUM, and DISTINCT, the results of which are not tied to a number of paths explored for query execution. A database system implements early solution pruning for a prunable query based on intermediate results maintained for the query during query execution. Specifically, when a system determines that property values of a given potential solution path cannot affect the query results reflected in intermediate results maintained for the query, the path is discarded from the set of possible query solutions without further exploration of the path.
Abstract:
Techniques to efficiently assign available workers to executing multiple graph queries concurrently on a distributed graph database are disclosed. The techniques comprise a runtime engine assigning multiple workers to executing portions of multiple graph queries, each worker in each assignment asynchronously executing a portion of a graph query within a parallel-while construct that includes return statements at different locations, and the runtime engine reassigning a worker to executing another portion of the same or a different graph query to optimize the overall performance of all workers.
Abstract:
Techniques are provided for mapping tables and columns of a legacy relational schema into synthetic tables that are dedicated for graph analysis. In an embodiment, a computer receives a mapping of relational tables to node tables and edge tables. The node tables contain columns and rows. The edge tables contain columns and rows. The rows of the node tables and the rows of the edge tables define a graph. Based on the mapping and the relational tables, the computer calculates a value of at least one column of at least one row of the node tables. Based on an execution of a query of the graph, the computer returns the value.
Abstract:
A storage manager maintains metadata for a plurality of graph components including, for each given graph component, a memory-state indicator that indicates whether the given graph component is stored in memory. The storage manager identifies a set of graph components required to execute a graph processing operation and identifies, based on the metadata, a first subset of the set of graph components that are stored in the memory and a second subset of the set of graph components that are not stored in the memory. The storage manager loads the second subset of graph components into memory and initiates execution of the graph processing operation using the set of graph components in memory.
Abstract:
A graph rebalancing approach is provided that allows a distributed graph system to effectively support elasticity by incrementally balancing distributed in-memory graphs uniformly or in a custom manner on a set of given machines. Performing the incremental rebalancing operation comprises selecting a chunk in a source machine in the cluster having a surplus of chunks, selecting a target machine in the cluster having a deficit of chunks, transferring the selected chunk from the source machine to the target machine, and updating metadata in each machine in the cluster to reflect a location of the graph data elements in the selected chunk in the target machine.
Abstract:
A graph processing system that supports automatic data model conversion from Resource Framework Description (RDF) to Property Graph (PG) is provided. Rather than using a naive conversion approach that creates PG nodes and edges without properties, a set of conversion rules is evaluated to automatically convert RDF triples into PG nodes and edges with properties, as appropriate. Accordingly, the converted PG data takes full advantage of the PG format while advantageously avoiding the creation of extraneous nodes and edges, allowing queries on the PG data to be efficiently executed on any database supporting the PG data model. The plurality of rules categorize each triple into three different cases depending on whether or not the predicate is “rdf:type” and whether or not the object is a literal value, generating graph entities as appropriate for each case. Optionally, user defined rules may override the automatic rules.
Abstract:
A graph processing system is provided for executing scouting queries for improving query planning. A query planner creates a plurality of scouting queries, each scouting query in the plurality of scouting queries corresponding to a query plan for a graph query and having an associated confidence value. A graph processing system performs limited execution of the plurality of scouting queries and determines a metric value for each scouting query in the plurality of scouting queries based on execution of the scouting query. The system determines a score for each scouting query in the plurality of scouting queries based on its metric value and the confidence value of the corresponding query plan and selects a query plan based on the scores of the plurality of scouting queries. The system executes the graph query based on the selected query plan.
Abstract:
A storage manager for offloading graph components to persistent storage for reducing resident memory in a distributed graph processing engine is provided. The storage manager identifies a set of graph components required to execute a graph processing operation on a graph in a graph processing engine of a database system and reserves an amount of memory needed to load the set of graph components into memory. The storage manager loads the set of graph components into memory and initiates execution of the graph processing operation using the set of graph components in memory. The storage manager evicts one or more unused graph components from memory in response to receiving a request to free a requested amount of memory from memory.
Abstract:
Techniques herein decouple available results, from graph analysis execution, to adapt to various deployment configurations. In an embodiment, a graph engine is deployed that has multiple mutually-exclusive configuration modes that include being embedded within a software application, centrally serving software applications, or distributed amongst a cluster of computers. Based on a current configuration mode of the graph engine, a software application receives or generates an analysis request to process a graph. The software application provides the analysis request to the graph engine in exchange for access to a computational future, of the graph engine, that is based on the analysis request and the graph. Based on a proxy of said computational future, the software application accesses a result of the analysis request. In an embodiment, a remote proxy exchanges representational state transfer (REST) messages. Network mechanisms, such as transport control protocol (TCP) and hypertext transfer protocol (HTTP), provide enhanced remoting.