Abstract:
According to one aspect of the invention, for a database statement that specifies evaluating reporting window functions, a computation-pushdown execution strategy may be used for the database statement. The computation-pushdown execution plan includes producer operators and consolidation operators. Each producer operator computes a respective partial aggregation for each reporting window function based on a subset of rows, and broadcasts the respective partial aggregation. Each consolidation operator fully aggregates all partial aggregations broadcasted from the producer operators. Alternatively, an extended-data-distribution-key execution plan may be used. Each producer operator sends rows based on hash keys to sort operators for computing partial aggregations for at least one reporting window function based on a subset of rows. Each consolidation operator receives and fully aggregates all partial aggregations broadcasted from the sort operators.
Abstract:
According to one aspect of the invention, for a database statement that specifies evaluating reporting window functions, a computation-pushdown execution strategy may be used for the database statement. The computation-pushdown execution plan includes producer operators and consolidation operators. Each producer operator computes a respective partial aggregation for each reporting window function based on a subset of rows, and broadcasts the respective partial aggregation. Each consolidation operator fully aggregates all partial aggregations broadcasted from the producer operators. Alternatively, an extended-data-distribution-key execution plan may be used. Each producer operator sends rows based on hash keys to sort operators for computing partial aggregations for at least one reporting window function based on a subset of rows. Each consolidation operator receives and fully aggregates all partial aggregations broadcasted from the sort operators.
Abstract:
Disclosed is a system, method, and computer program product to efficiently process multi-set operations in a database system. An approach is described to perform a group-by operation with a counter to efficiently process such queries. Techniques are described to optimize multi-set operations into regular-set operations.
Abstract:
An approach for implementing function semantic based partition-wise SQL execution and partition pruning in a data processing system is provided. The system receives a query directed to a range-partitioned table and determines if operation key(s) of the query include(s) function(s) over the table partitioning key(s). If so, the system obtains a set of values corresponding to each partition by evaluating the function(s) on a low bound and/or a high bound table partitioning key value corresponding to the partition. The system may then compare the sets of values corresponding to different partitions and determine whether to aggregate results obtained by executing the query over the partitions based on the comparison. The system may also determine whether to prune any partitions from processing based on a set of correlations between the set of values for each partition and predicate(s) of the query including function(s) over the table partitioning key(s).
Abstract:
Techniques herein improve computational efficiency for parallel queries with run-time data pruning by using adaptive granule generation. In an embodiment, an execution plan is generated for a query to be executed by a plurality of slave processes, the execution plan comprising a plurality of plan operators. For a first plan operator of the plurality of plan operators, a first set of work granules is generated, and for a second plan operator of the plurality of plan operators, a second set of work granules is generated. A first subset of slave processes of the plurality of slave processes is assigned the first set of work granules. Based on the execution of the first set of work granules by the first subset of slave processes, a bloom filter is generated that specifies for which of said first set of work granules no output rows were generated. Based on the bloom filter, the second set of work granules is modified and the modified second set of work granules is assigned to a second subset of slave processes and executed.
Abstract:
Execution plans generated for multiple analytic queries incorporate two new kinds of plan operators, a partition creator and partition iterator. The partition creator and partition iterator operate as a pair. A partition creator operator creates partitions of rows and a partitioning descriptor describing the partitions created. A partition iterator iterates through the partitions based on the partitioning descriptor. For each partition, multiple analytic operators are executed serially, one after the other, on the same rows in the partition. According to an embodiment, partitioning is based on a common grouping or subgrouping of the multiple aggregate functions or operators. Columns in the grouping or subgrouping may be ignored when executing each of the multiple analytic operators. Forming execution plans that include partition creator and partition iterator in this way is referred to herein as partitioning injection.
Abstract:
Techniques are described herein for efficient set operation execution. According to some embodiments, a request is received to perform a set operation with respect to a first data set and a second data set. The request may identify the first data set, the second data set, and a type of set operation to perform. In response to receiving the request, a hash table is generated in memory from a first set of records in the first data set, and a second set of records from the second data set is probed against the hash table. Based on probing the hash table and the type of set operation identified in the request, records that satisfy the set operation are identified and output from the hash table.
Abstract:
A method, apparatus, and system for dynamic parallel aggregation with hybrid batch flushing are provided. Record sources of an aggregation operator in a query execution plan may dynamically aggregate using the same aggregation operator. The dynamic aggregation creates a batch of aggregation records from an input source, which are then used to aggregate further records from the input source. If a record from the input source is not matched to an aggregation record in the batch, then the record is passed to the next operator. In this manner, records are aggregated ahead of time at a record source to reduce the number of records passed between operators, reducing the impact of network I/O between nodes of a parallel processing system. By adjusting the contents of the batch according to aggregation performance monitored during run-time, hybrid batch flushing can be implemented to adapt to changing data patterns and skewed values.
Abstract:
Techniques are described herein for subquery removal given two set operation-based subqueries in a query, where one subquery contains the result of the other. The described optimization technique of subquery removal is enabled by join and set operation-based containment of the set operation-based subqueries where semantic equivalence can be established for a given pair of set operation-based subqueries when some table(s)—with associated join condition(s), correlation condition(s), and/or filter predicate(s)—in one subquery are not considered. Subquery removal reduces multiple access to the same table and multiple evaluations of the same join conditions required to evaluate the query. When a subquery is removed from a disjunction, this may lead to other optimizations such as subquery unnesting, e.g., when the original query configuration would not permit query unnesting and the rewritten query (with one or more removed subqueries) permits unnesting.
Abstract:
Techniques are described herein for efficient set operation execution. According to some embodiments, a request is received to perform a set operation with respect to a first data set and a second data set. The request may identify the first data set, the second data set, and a type of set operation to perform. In response to receiving the request, a hash table is generated in memory from a first set of records in the first data set, and a second set of records from the second data set is probed against the hash table. Based on probing the hash table and the type of set operation identified in the request, records that satisfy the set operation are identified and output from the hash table.