Abstract:
Techniques are provided that address the problems associated with prior approaches for clustering a fact table in a relational database management system. According to one aspect of the invention, a database server clusters a fact table in a database based on one or more dimension tables. More specifically, rows are stored in the fact table in a sorted order and the order in which the rows are sorted is based on values in one or more columns of one or more of the dimension tables. A user specifies the columns of the dimension tables on which the sorted order is based in “clustering criteria”. The database server uses the clustering criteria to automatically store the rows in the fact table in the sorted order in response to certain user-initiated database operations on the fact-table.
Abstract:
A histogram-augmented dynamic sampling approach is provided for determining cardinality of a two-table join. The approach has a pre-processing phase in which data structures are created that will be used during a compilation phase for cardinality estimation. These data structures include a row histogram and a key histogram, which are created for selected columns of a first table. A cardinality estimation phase uses the data structures to estimate the cardinality of various joins at the time of query compilation. In this phase, the system executes queries that join the histograms with a second table, to perform the cardinality estimation.
Abstract:
Techniques for automatic error mitigation in database systems using alternate plans are provided. After receiving a database statement, an error is detected as a result of compiling the database statement. In response to detecting the error, one or more alternate plans that were used to process the database statement or another database statement that is similar to the database statement are identified. A particular alternate plan of the one or more alternate plans is selected. A result of the database statement is generated based on processing the particular alternate plan.
Abstract:
Techniques for subsumption of inline views and subqueries in a query are described. An optimization technique of subsumption is enabled by inline views having identical tables and identical join conditions and having aggregation functions but no group-by clauses. When subsumption takes place, a single query block replaces the inline views (or subqueries) with a single inline view query block. Subsumption reduces multiple access to the same table and multiple evaluations of the same join conditions required to evaluate the query. The single query block includes factored out filter predicates and unified predicates that originate from the subsumed inline views (or subqueries). Based on similarities among the aggregation functions and filter predicates in the subsumed inline views, pre-computation of common aggregates may be performed in a new group-by view in the subsuming view.
Abstract:
Embodiments generate random walks through a directed graph that is represented in a relational database table. Each row of the graph table represents a directed edge in the graph and includes a source vertex and a destination vertex. Each row is further augmented to (a) indicate the number of outbound edges starting from the destination vertex in the row and (b) include an identifier that distinguishes the edge from other outbound edges starting from the same source vertex. An SQL query may be executed on the augmented graph table. Starting from a source vertex (starting vertex or the destination vertex of the previously selected hop) the query randomly selects a row of the graph table representing one of the outbound edges from the source vertex and adds the selected outbound edge as a row in a random walk table that represents the next hop in the random walk.
Abstract:
Embodiments generate random walks through a directed graph that is represented in a relational database table. Each row of the graph table represents a directed edge in the graph and includes a source vertex and a destination vertex. Each row is further augmented to (a) indicate the number of outbound edges starting from the destination vertex in the row and (b) include an identifier that distinguishes the edge from other outbound edges starting from the same source vertex. An SQL query may be executed on the augmented graph table. Starting from a source vertex (starting vertex or the destination vertex of the previously selected hop) the query randomly selects a row of the graph table representing one of the outbound edges from the source vertex and adds the selected outbound edge as a row in a random walk table that represents the next hop in the random walk.
Abstract:
Techniques are described for storing and maintaining, in a materialized view, bitmap data that represents a bitmap of each possible distinct value of an expression and rewriting a query for a count of distinct values of the expression using the materialized view. The materialized view contains bitmap data that represents a bitmap of each possible distinct value of a first expression, and aggregate values of additional expressions, and is stored in memory or on disk by a database system. The database system receives a query that requests a number of distinct values, of the first expression, and an aggregate value for an additional expression. In response, the database system, rewrites the query to: compute the number of distinct values by counting the bits in the bitmap data of the materialized view that are set to the first value, and obtains the aggregate value for the additional expression in the materialized view.
Abstract:
Computer systems, machine-implemented methods, and stored instructions are provided herein for maintaining information that describes aggregate characteristics of data within zones. Stored data may be separated into defined zone(s). Data structure(s), such as zone map(s), may store, for each of the zone(s), aggregate characteristic(s) of data in the zone, and a stored indication of whether or not the zone is stale. When a change is made to data in a particular zone that was not stale, a zone manager causes the particular zone to become stale if the change can result in the particular zone having data that is not included in the particular zone's stored aggregate characteristic(s). On the other hand, if the change cannot result in the particular zone having data that is not included in the particular zone's stored aggregate characteristic(s), then the zone manager does not cause the particular zone to become stale.
Abstract:
Techniques are provided for generating a “dimensional zonemap” that allows a database server to avoid scanning disk blocks of a fact table based on filter predicates in a query that qualify one or more dimension tables. The zonemap divides the fact table into sets of contiguous disk blocks referred to as “zones”. For each zone, a minimum value and a maximum value for each of one or more “zoned” columns of the dimension tables is determined and maintained in the zonemap. For a query that contains a filter predicate on a zoned column, the predicate value can be compared to the minimum value and maximum value maintained for a zone for that zoned column to determine whether a scan of the disk blocks of the zone can be skipped.