Abstract:
Techniques for subsumption of inline views and subqueries in a query are described. An optimization technique of subsumption is enabled by inline views having identical tables and identical join conditions and having aggregation functions but no group-by clauses. When subsumption takes place, a single query block replaces the inline views (or subqueries) with a single inline view query block. Subsumption reduces multiple access to the same table and multiple evaluations of the same join conditions required to evaluate the query. The single query block includes factored out filter predicates and unified predicates that originate from the subsumed inline views (or subqueries). Based on similarities among the aggregation functions and filter predicates in the subsumed inline views, pre-computation of common aggregates may be performed in a new group-by view in the subsuming view.
Abstract:
Techniques are described for storing and maintaining, in a materialized view, bitmap data that represents a bitmap of each possible distinct value of an expression and rewriting a query for a count of distinct values of the expression using the materialized view. The materialized view contains bitmap data that represents a bitmap of each possible distinct value of a first expression, and aggregate values of additional expressions, and is stored in memory or on disk by a database system. The database system receives a query that requests a number of distinct values, of the first expression, and an aggregate value for an additional expression. In response, the database system, rewrites the query to: compute the number of distinct values by counting the bits in the bitmap data of the materialized view that are set to the first value, and obtains the aggregate value for the additional expression in the materialized view.
Abstract:
A method and system for processing database queries containing aggregate functions. The query may specify fewer groups than there are processes available to process the queries. Further, the queries may target a set of rows and specify a sort-by key and a group-by key. The method and system further includes determining that the queries specify application of the aggregate function to each of a plurality of groups that may correspond to a plurality of distinct values of the group-by key and determining that plurality of processes are available to process the queries. The method and system also includes determining the plurality of ranges of a composite key that may be formed by combining the group-by key and the sort-by key and assigning each range of the plurality ranges to a corresponding process to calculate the aggregate function.
Abstract:
Techniques are provided herein for processing a query using in-memory cursor duration temporary tables. The techniques involve storing a part of the temporary table in memory of nodes in a database cluster. A part of the temporary table may be stored in disk segments of nodes in the database cluster. Writer threads running on a particular node writes data for the temporary table to the memory of the particular node. Excess data may be written to the disk segment of the particular node. Reader threads running on the particular node reads data for the temporary table from the memory of the particular node and the disk segment of the particular node.
Abstract:
A method and system for processing database queries containing aggregate functions. The query may specify fewer groups than there are processes available to process the queries. Further, the queries may target a set of rows and specify a sort-by key and a group-by key. The method and system further includes determining that the queries specify application of the aggregate function to each of a plurality of groups that may correspond to a plurality of distinct values of the group-by key and determining that plurality of processes are available to process the queries. The method and system also includes determining the plurality of ranges of a composite key that may be formed by combining the group-by key and the sort-by key and assigning each range of the plurality ranges to a corresponding process to calculate the aggregate function.
Abstract:
According to one aspect of the invention, for a database statement that specifies evaluating reporting window functions, a computation-pushdown execution strategy may be used for the database statement. The computation-pushdown execution plan includes producer operators and consolidation operators. Each producer operator computes a respective partial aggregation for each reporting window function based on a subset of rows, and broadcasts the respective partial aggregation. Each consolidation operator fully aggregates all partial aggregations broadcasted from the producer operators. Alternatively, an extended-data-distribution-key execution plan may be used. Each producer operator sends rows based on hash keys to sort operators for computing partial aggregations for at least one reporting window function based on a subset of rows. Each consolidation operator receives and fully aggregates all partial aggregations broadcasted from the sort operators.
Abstract:
According to one aspect of the invention, for a database statement that specifies evaluating reporting window functions, a computation-pushdown execution strategy may be used for the database statement. The computation-pushdown execution plan includes producer operators and consolidation operators. Each producer operator computes a respective partial aggregation for each reporting window function based on a subset of rows, and broadcasts the respective partial aggregation. Each consolidation operator fully aggregates all partial aggregations broadcasted from the producer operators. Alternatively, an extended-data-distribution-key execution plan may be used. Each producer operator sends rows based on hash keys to sort operators for computing partial aggregations for at least one reporting window function based on a subset of rows. Each consolidation operator receives and fully aggregates all partial aggregations broadcasted from the sort operators.
Abstract:
Techniques for the automatic creation and maintenance of zone maps are provided. In one technique, a set of data sets is identified. For each data set, a data set width is determined based on a maximum value in the data set and a minimum value in the data set. One or more zones within the data set are identified. For each zone, a zone width is determined based on a difference between a maximum value in that zone and a minimum value in that zone. An aggregate zone width is generated that is based on the zone width of each zone. Based on the data set width and the aggregate zone width, it is determined whether to automatically generate a zone map for the data set.
Abstract:
Attributes and semantics of duplicate insignificance that are inherent or inferred in a database language statement are detected. Also, a join operation that is inherent or inferred in the database language statement is detected and examined for join semantics. The join semantics specifies or refers to a driving table to be subjected to a hash join operation that may populate one or more hash buckets. The optimizer and the execution layers may use cost estimation or heuristics to assign the left and right table roles to the tables involved in the join. The hash join operation removes left table duplicates during population of the hash buckets, resulting in full or partial duplicate elimination that occurs during the hash join operation.
Abstract:
A method, apparatus, and stored instructions are provided for the removal of redundant GROUP BY and/or DISTINCT. Every table in the FROM clause of the query block must be a qualified table for the GROUP-BY clause or the DISTINCT keyword in the SELECT clause of the query block to be removed. A table Tx that satisfies at least one of the following two conditions is referred to as a qualified table: (1) Tx has a non-null unique column Tx.u that appears on the GROUP BY clause or the SELECT clause that contains a DISTINCT keyword and (2) There is a qualified table Ty and Ty has a filtering join with Tx.