Query optimization for group-by extensions and distinct aggregate functions

    公开(公告)号:US10007700B2

    公开(公告)日:2018-06-26

    申请号:US14753590

    申请日:2015-06-29

    CPC classification number: G06F16/24542 G06F16/244 G06F16/24537 G06F16/24556

    Abstract: Techniques for query optimization for group-by extensions and distinct aggregate functions are provided. A query has an extended group-by clause with an extended group-by operator and a first set of group-by columns. The query has one or more distinct aggregate functions and one or more non-distinct aggregate functions. An initial subquery is constructed that generates a partially aggregated initial temporary (PAIT) table when executed. The initial subquery includes a GROUP BY clause with a second set of group-by columns that includes the first set of group-by columns of the extended group-by clause of the query and one or more columns specified by the one or more distinct aggregate functions. One or more subqueries are computed that compute groupings indicated by the extended group-by operator based on the PAIT table generated by the initial subquery.

    N-WAY HASH JOIN
    12.
    发明申请
    N-WAY HASH JOIN 审中-公开

    公开(公告)号:US20180075101A1

    公开(公告)日:2018-03-15

    申请号:US15266751

    申请日:2016-09-15

    Abstract: Techniques are described herein for processing queries comprising joins specifying a plurality of tables. The techniques involve partitioning the tables by assigning rows to partitions. One or more partition maps may be generated to indicate the partitions. Subsequent tables may be partitioned based on the generated partition maps. The partitions may be used to determine results for the joins.

    Scalable and adaptive evaluation of reporting window functions

    公开(公告)号:US09183252B2

    公开(公告)日:2015-11-10

    申请号:US13754687

    申请日:2013-01-30

    Abstract: According to one aspect of the invention, for a database statement that specifies evaluating reporting window functions, a computation-pushdown execution strategy may be used for the database statement. The computation-pushdown execution plan includes producer operators and consolidation operators. Each producer operator computes a respective partial aggregation for each reporting window function based on a subset of rows, and broadcasts the respective partial aggregation. Each consolidation operator fully aggregates all partial aggregations broadcasted from the producer operators. Alternatively, an extended-data-distribution-key execution plan may be used. Each producer operator sends rows based on hash keys to sort operators for computing partial aggregations for at least one reporting window function based on a subset of rows. Each consolidation operator receives and fully aggregates all partial aggregations broadcasted from the sort operators.

    Bitmap-based count distinct query rewrite in a relational SQL algebra

    公开(公告)号:US11379476B2

    公开(公告)日:2022-07-05

    申请号:US16653639

    申请日:2019-10-15

    Abstract: Techniques are described for storing and maintaining, in a materialized view, bitmap data that represents a bitmap of each possible distinct value of an expression and rewriting a query for a count of distinct values of the expression using the materialized view. The materialized view contains bitmap data that represents a bitmap of each possible distinct value of a first expression, and aggregate values of additional expressions, and is stored in memory or on disk by a database system. The database system receives a query that requests a number of distinct values, of the first expression, and an aggregate value for an additional expression. In response, the database system, rewrites the query to: compute the number of distinct values by counting the bits in the bitmap data of the materialized view that are set to the first value, and obtains the aggregate value for the additional expression in the materialized view.

    Leveraging columnar encoding for query operations

    公开(公告)号:US10572475B2

    公开(公告)日:2020-02-25

    申请号:US15713365

    申请日:2017-09-22

    Abstract: Techniques are described for leveraging column dictionaries of tables for join, group-by and expression evaluation operations. In an embodiment, a table is stored in one or more data units, each data unit's metadata containing dictionaries for stored columns. Rather than storing unencoded column values, the data units may store columns as column vectors of dictionary-encoded values, in an embodiment. When performing a join operation, a matching of values may be performed on the build-side table using the unencoded, unencoded, values stored in the join-key dictionary(s) of the probe-side table, thus, significantly reducing the number of searching and matching operations. In an embodiment, a group-by operation may be executed by performing partial aggregations based on unique group-by key values as stored in the one or more group-by key dictionaries. For an expression evaluation, only a single evaluation may be performed for each unique combination of expression-key values in a data unit by leveraging the one or more expression-key dictionaries.

    In-memory cursor duration temp tables

    公开(公告)号:US10452655B2

    公开(公告)日:2019-10-22

    申请号:US15268519

    申请日:2016-09-16

    Abstract: Techniques are provided herein for processing a query using in-memory cursor duration temporary tables. The techniques involve storing a part of the temporary table in memory of nodes in a database cluster. A part of the temporary table may be stored in disk segments of nodes in the database cluster. Writer threads running on a particular node writes data for the temporary table to the memory of the particular node. Excess data may be written to the disk segment of the particular node. Reader threads running on the particular node reads data for the temporary table from the memory of the particular node and the disk segment of the particular node.

    REDUNDANT GROUP BY AND DISTINCT REMOVAL
    18.
    发明申请

    公开(公告)号:US20190026332A1

    公开(公告)日:2019-01-24

    申请号:US15658249

    申请日:2017-07-24

    Abstract: A method, apparatus, and stored instructions are provided for the removal of redundant GROUP BY and/or DISTINCT. Every table in the FROM clause of the query block must be a qualified table for the GROUP-BY clause or the DISTINCT keyword in the SELECT clause of the query block to be removed. A table Tx that satisfies at least one of the following two conditions is referred to as a qualified table: (1) Tx has a non-null unique column Tx.u that appears on the GROUP BY clause or the SELECT clause that contains a DISTINCT keyword and (2) There is a qualified table Ty and Ty has a filtering join with Tx.

    SORT-MERGE BAND JOIN OPTIMIZATION
    19.
    发明申请

    公开(公告)号:US20180101573A1

    公开(公告)日:2018-04-12

    申请号:US15726030

    申请日:2017-10-05

    CPC classification number: G06F16/24544 G06F16/24537

    Abstract: Techniques herein optimize sort-merge join method for a band join. In an embodiment, for a query comprising a query block specifying a join between a first table and a second table, a band join condition is detected between the first table and the second table. Once the band join condition in detected, an execution plan is generated and executed. The execution of the execution plan includes: for a first row of at least a subset of first sorted rows, scanning second rows from a set of second sorted rows, joining each of said second rows with said first row, and ceasing to scan when encountering a row from the second sorted rows that falls outside a bound of said band join condition. Techniques also include parallelizing a workload by overlapping the distribution of rows to the same slave process and computing cost and cardinality estimation for enhanced band join.

    AUTOMATIC CREATION AND MAINTENANCE OF ZONE MAPS

    公开(公告)号:US20220114195A1

    公开(公告)日:2022-04-14

    申请号:US17068357

    申请日:2020-10-12

    Abstract: Techniques for the automatic creation and maintenance of zone maps are provided. In one technique, a set of data sets is identified. For each data set, a data set width is determined based on a maximum value in the data set and a minimum value in the data set. One or more zones within the data set are identified. For each zone, a zone width is determined based on a difference between a maximum value in that zone and a minimum value in that zone. An aggregate zone width is generated that is based on the zone width of each zone. Based on the data set width and the aggregate zone width, it is determined whether to automatically generate a zone map for the data set.

Patent Agency Ranking