Abstract:
Techniques are presented herein for efficient query processing and data change propagation at a secondary database system. The techniques involve determining execution costs for executing a query at a primary DBMS and for executing the query at an offload DBMS. The cost for executing the query at the offload DBMS includes the cost of propagating changes to database objects required by the query to the offload DBMS. Based on the execution cost, the query is sent to either the primary DBMS or the offload DBMS.
Abstract:
A system and method for processing a group and aggregate query on a relation are disclosed. A database system determines whether assistance of a heterogeneous system (HS) of compute nodes is beneficial in performing the query. Assuming that the relation has been partitioned and loaded into the HS, the database system determines, in a compile phase, whether the HS has the functional capabilities to assist, and whether the cost and benefit favor performing the operation with the assistance of the HS. If the cost and benefit favor using the assistance of the HS, then the system enters the execution phase. The database system starts, in the execution phase, an optimal number of parallel processes to produce and consume the results from the compute nodes of the HS. After any needed transaction consistency checks, the results of the query are returned by the database system.
Abstract:
Techniques are described for efficient query processing and data change propagation to a secondary database system. The secondary database system may execute queries received at a primary database system. Database changes made at the primary system are copied to the secondary system. The primary system receives a query to be executed on either the primary system or the secondary system. The primary system determines whether to send the query to the secondary system based upon whether data objects stored within the secondary system have pending changes that need to be applied to the data objects. The pending changes are stored within in-memory journals within the primary system. The primary system scans for the pending changes to the data objects and sends the pending changes to the secondary system. The secondary system then receives and applies the pending changes to the data objects within the secondary system. Upon applying the pending changes, the secondary system executes the query.
Abstract:
Techniques related to version control based on a dual-range validity model are disclosed. In an embodiment, an online analytical processing (OLAP) server stores a plurality of version records describing versions of a data item. A version record may describe any open transactions for a version of the data item. The version record may specify a commit timestamp for the data item at a database and a valid timestamp at least as great as the commit timestamp. The commit timestamp and the valid timestamp may specify a validity range. The version record may also specify an expiration timestamp, which along with the valid timestamp may specify an unresolved range. The OLAP server may also identify a valid version of the data item for a query timestamp that corresponds to a query for particular data in the data item and that falls within either the validity range or the unresolved range.
Abstract:
Techniques for performing database operations using vectorized instructions are provided. In one technique, a hash table probe phase involves executing vectorized instructions to determine where in a bucket a particular key is located. This determination may be preceded by one or more vectorized instructions that are used to determine whether the bucket contains the particular key.
Abstract:
A system and method for allocating join processing between and RDBMS and an assisting cluster. In one embodiment, the method estimates a cost of performing the join completely in the RDBMS and the cost of performing the join with the assistance of a cluster coupled to the RDBMS. The cost of performing the join with the assistance of the cluster includes estimating a cost of a broadcast join or a partition join depending on the sizes of the tables. Additional costs are incurred when there is a blocking operation, which prevents the cluster from being able to process portions of the join. The RDBMS also maintains transactional consistency when the cluster performs some or all of the join processing.
Abstract:
Techniques described herein allow a user of an RDBMS to specify a graph algorithm function (GAF) that takes a graph object as input and returns a logical graph object as output. GAFs are used within graph queries to compute temporary and output properties (“GAF-computed properties”), which are live for the duration of the query cursor execution. GAF-computed output properties are accessible in the enclosing graph pattern matching query as though they were part of the input graph object of the GAF. Temporary cursor-duration tables are generated for the query cursor during compilation of a graph query that includes a GAF, and are used to store the GAF-computed properties. Each temporary table corresponds to one of the primary tables of the input graph, and includes, as a foreign key, primary key information from the corresponding primary table. Thus, the input graph of a GAF may be a “heterogeneous” graph.
Abstract:
Techniques are described for efficient query processing and data change propagation to a secondary database system. The secondary database system may execute queries received at a primary database system. Database changes made at the primary system are copied to the secondary system. The primary system receives a query to be executed on either the primary system or the secondary system. The primary system determines whether to send the query to the secondary system based upon whether data objects stored within the secondary system have pending changes that need to be applied to the data objects. The pending changes are stored within in-memory journals within the primary system. The primary system scans for the pending changes to the data objects and sends the pending changes to the secondary system. The secondary system then receives and applies the pending changes to the data objects within the secondary system. Upon applying the pending changes, the secondary system executes the query.
Abstract:
Techniques are described to evaluate an operation from an execution plan of a query to offload the operation to another database management system for less costly execution. In an embodiment, the execution plan is determined based on characteristics of the database management system that received the query for execution. One or more operations in the execution plan are then evaluated for offloading to another heterogeneous database management system. In a related embodiment, the offloading cost for each operation may also include communication cost between the database management systems. The operations that are estimated to be less costly to execute on the other database management system are then identified for offloading to the other database management system. In an alternative embodiment, the database management system generates permutations of execution plans for the same query, and similarly evaluates each permutation of the execution plans for offloading its one or more operations. Based on the total cost of each permutation, which may include offloading cost for one or more operations to another database management system, the least costly plan is selected for the query execution.
Abstract:
Techniques related to distributed relational dictionaries are disclosed. In some embodiments, one or more non-transitory storage media store a sequence of instructions which, when executed by one or more computing devices, cause performance of a method. The method involves generating, by a query optimizer at a distributed database system (DDS), a query execution plan (QEP) for generating a code dictionary and a column of encoded database data. The QEP specifies a sequence of operations for generating the code dictionary. The code dictionary is a database table. The method further involves receiving, at the DDS, a column of unencoded database data from a data source that is external to the DDS. The DDS generates the code dictionary according to the QEP. Furthermore, based on joining the column of unencoded database data with the code dictionary, the DDS generates the column of encoded database data according to the QEP.