Abstract:
A system and method for performing a query operation on a pair of relations in a database system coupled to a heterogeneous system (HS) is disclosed. Assuming that that pair of relations is partitioned and already loaded into the HS, the database system receives a query on the pair of relations and based on the type of query operation computes the cost of performing the query operation on the database alone or the costs of performing the query operation with the assistance of the HS, each of the costs corresponding to a particular algorithm. If the costs indicate that the HS improves the performance of the query operation, then the HS computes portions of the operation, and returns the results back to the database system. If any parts of the relation are out of sync with the database system, the database system performs operations to maintain transactional consistency.
Abstract:
Disclosed herein are techniques for storing, within a database system, metadata that indicates an intended usage (IU). Once created, an IU may be assigned to a column to (a) indicate how the column is intended to be used, and (b) affect how the database server behaves when database operations involve values from the column. The IU assigned to a column supplements, but does not replace, the datatype definition for the column. Each IU may have an IU-bundle. The IU-bundle of an IU indicates how the database server behaves with respect to any column that is assigned the IU. For example, the IU-bundle may indicate constraints that the database server must validate during operations on values from columns assigned to the IU. Techniques are also described for implementing multi-column IUs and flexible IUs.
Abstract:
Techniques are described for storing and maintaining, in a materialized view, bitmap data that represents a bitmap of each possible distinct value of an expression and rewriting a query for a count of distinct values of the expression using the materialized view. The materialized view contains bitmap data that represents a bitmap of each possible distinct value of a first expression, and aggregate values of additional expressions, and is stored in memory or on disk by a database system. The database system receives a query that requests a number of distinct values, of the first expression, and an aggregate value for an additional expression. In response, the database system, rewrites the query to: compute the number of distinct values by counting the bits in the bitmap data of the materialized view that are set to the first value, and obtains the aggregate value for the additional expression in the materialized view.
Abstract:
Techniques related to distributed relational dictionaries are disclosed. In some embodiments, one or more non-transitory storage media store a sequence of instructions which, when executed by one or more computing devices, cause performance of a method. The method involves generating, by a query optimizer at a distributed database system (DDS), a query execution plan (QEP) for generating a code dictionary and a column of encoded database data. The QEP specifies a sequence of operations for generating the code dictionary. The code dictionary is a database table. The method further involves receiving, at the DDS, a column of unencoded database data from a data source that is external to the DDS. The DDS generates the code dictionary according to the QEP. Furthermore, based on joining the column of unencoded database data with the code dictionary, the DDS generates the column of encoded database data according to the QEP.
Abstract:
Techniques related to a sparse dictionary tree are disclosed. In some embodiments, computing device(s) execute instructions, which are stored on non-transitory storage media, for performing a method. The method comprises storing an encoding dictionary as a token-ordered tree comprising a first node and a second node, which are adjacent nodes. The token-ordered tree maps ordered tokens to ordered codes. The ordered tokens include a first token and a second token. The ordered codes include a first code and a second code, which are non-consecutive codes. The first node maps the first token to the first code. The second node maps the second token to the second code. The encoding dictionary is updated based on inserting a third node between the first node and the second node. The third node maps a third token to a third code that is greater than the first code and less than the second code.
Abstract:
Data recovery for a compute node in a heterogeneous database system is provided. A failure is detected of a particular compute node of a compute cluster comprising a plurality of compute nodes. The compute cluster is configured to store, in memory, data stored by a RDBMS. Particular data of the data stored by the RDBMS is identified that is assigned to the particular compute node. The particular compute node is restored. After restoring the particular compute node, the particular data assigned to the particular compute node is reloaded without taking the particular data offline. During reloading, the particular compute node receives pending modified data comprising data of the particular data that was modified during said reloading.
Abstract:
A two-level cache to facilitate resolving resource path expressions for a hierarchy of resources is described, which includes a system-wide shared cache and a session-level cache. The shared cache is organized as a hierarchy of hash tables that mirrors the structure of a repository hierarchy. A particular hash table in a shared cache includes information for the child resources of a particular resource. A database management system that manages a shared cache may control the amount of memory used by the cache by implementing a replacement policy for the cache based on one or more characteristics of the resources in the repository. The session-level cache is a single level cache in which information for target resources of resolved path expressions may be tracked. In the session-level cache, the resource information is associated with the entire path expression of the associated resource.
Abstract:
Techniques for performing database operations using vectorized instructions are provided. In one technique, an aggregation operation involves executing vectorized instructions to update a data value that corresponds to a particular key. The aggregation operation may be one of count, sum, minimum, maximum, or average.
Abstract:
Techniques for performing database operations using vectorized instructions are provided. In one technique, a hash table probe phase involves executing vectorized instructions to determine where in a bucket a particular key is located. This determination may be preceded by one or more vectorized instructions that are used to determine whether the bucket contains the particular key.
Abstract:
An RDBMS specifies a graph algorithm function (GAF) that takes a graph object as input and returns a logical graph object as output. GAFs are used within graph queries to compute temporary and output properties (“GAF-computed properties”), which are live for the duration of the query cursor execution. GAF-computed output properties are accessible in the enclosing graph pattern matching query as though they were part of the input graph object of the GAF. Temporary cursor-duration tables are generated for the query cursor during compilation of a graph query that includes a GAF, and are used to store the GAF-computed properties. Each temporary table corresponds to one of the primary tables of the input graph, and includes, as a foreign key, primary key information from the corresponding primary table.