摘要:
A method and apparatus for optimizing SQL queries in a relational database management system uses early-out join transformations. An early-out join comprises a many-to-one existential join, wherein the join scans an inner table for a match for each row of the outer table and terminates the scan for each row of the outer table when a single match is found in the inner table. To transform a many-to-many join to an early-out join, the query must include a requirement for distinctiveness, either explicitly or implicitly, in one or more result columns for the join operation. Distinctiveness can be specified using the DISTINCT keyword in the SELECT clause or can be implied from the predicates present in the query. The early-out join transformation also requires that no columns of the inner table be referenced after the join, or if an inner table column is referenced after the join, that each referenced column be "bound". A referenced column can be bound in one of three ways: (1) an inner table column can be bound to a constant through an equality predicate, (2) an inner table column can be bound to an outer table column, or (3) an inner table column can be bound to a correlated value, wherein the correlated value originates outside the query block. In all three cases, an inner table column can be bound through the transitivity of equality predicates.
摘要:
A method and apparatus for optimizing SQL queries in a relational database management system uses early-out join transformations. An early-out join comprises a many-to-one existential join, wherein the join scans an inner table for a match for each row of the outer table and terminates the scan for each row of the outer table when a single match is found in the inner table. To transform a many-to-many join to an early-out join, the query must include a requirement for distinctiveness, either explicitly or implicitly, in one or more result columns for the join operation. Distinctiveness can be specified using the DISTINCT keyword in the SELECT clause or can be implied from the predicates present in the query. The early-out join transformation also requires that no columns of the inner table be referenced after the join, or if an inner table column is referenced after the join, that each referenced column be "bound". A referenced column can be bound in one of three ways: (1) an inner table column can be bound to a constant through an equality predicate, (2) an inner table column can be bound to an outer table column, or (3) an inner table column can be bound to a correlated value, wherein the correlated value originates outside the query block. In all three cases, an inner table column can be bound through the transitivity of equality predicates.
摘要:
The present invention optimizes SQL queries by exploiting uniqueness properties. In identifying whether the generalized 1-tuple condition exists, the query is first analyzed to determine whether any columns referenced in a predicate of the query are bound. According to the present invention, columns may be bound to constant values or correlated columns or columns that are already bound. The bound columns, if any, are then analyzed to determine whether any of the bound columns comprise a key for its associated table. If these conditions exist, then the query satisfies the 1-tuple condition, in that it returns at most one tuple. Once the generalized 1-tuple condition has been identified to exist for the query, important query transformations can be performed for optimization purposes. These query transformations comprise the transformation of scalar subqueries into joins, or the elimination of distinctiveness requirements (i.e., DISTINCT keywords) from SELECT clauses.
摘要:
A method and apparatus for optimizing SQL queries by propagating and exploiting column nullability. Column nullability is identified and propagated using a three-valued logic, wherein a column of a table can be identified nullability information is exploited to optimize query operations through transformations. In one aspect of the present invention, quantified predicates (such as ">ALL") are transformed into simple predicates involving singleton subqueries so that indexing can be exploited. In another aspect of the present invention, "is not null" predicates are generated and pushed for certain aggregate queries. In still another aspect of the present invention, intersect operations are transformed into joins. The end result is that the present invention can significantly enhance the performance of the queries.
摘要:
The invention herein provides method and apparatus, including software for determining a set of materialized views or indices of the contents or a subset of the contents of a database in a data processing system to be created for one or more users of the database. The method and apparatus provide method and means for evaluating a workload presented by a user to the database; evaluating the data processing system characteristics; evaluating the database characteristics; and, using the above evaluations for recommending a set of suitable materialized views or indices to the user. Another aspect of the invention, which may be used for a workload presented by a user of a database in a data processing system, provides method and apparatus, including software for determining a set of materialized views or indices of the contents or a subset of the contents of the database, by: generating a plurality of materialized view candidates from evaluation of the workload, data processing system characteristics and database characteristics; estimating statistics for the materialized view candidates such as the number of rows, row size, and column statistics; generating a plurality of potential index candidates by evaluating the workload, data processing system characteristics, database characteristics and the materialized view candidates; and, from the materialized view candidates and index candidates selecting a set of suitable materialized views and/or indices for submission to the user.
摘要:
An cost based optimizer optimizes access to at least a portion of hierarchically-organized documents, such as those formatted using eXtensible Markup Language (XML), by estimating a number of results produced by the access of the hierarchically-organized documents. Estimating the number of results comprises computing the cardinality of each operator executing query language expressions and further computing a sequence size of sequences of hierarchically-organized nodes produced by the query language expressions. Access to the hierarchically-organized documents is optimized using the structure of the query expression and/or path statistics involving the hierarchically-organized data. The cardinality and the sequence size are used to calculate a cost estimation for execution of alternate query execution plans. Based on the cost estimation, an optimal query execution plan is selected from among the alternate query execution plans.
摘要:
A procedure for detecting a reordering requirement in a directed record stream during query execution in a relational database processing system. The query compiler component of a relational database processing system includes procedures for building query execution plans (QEPs) for evaluation preparatory to selecting an optimal plan for execution. These plans are constructed from the bottom up using an internal graphical representation for the user query that has a number of relation nodes interconnected by directed record streams (data flows). A relational operation within each node imposes an "order requirement" on the outflow stream represented by an order requirement vector O.sub.R. The records within each directed record stream have an "order property" represented by an order property vector O.sub.P. Order detection occurs when these two vectors are compared to determine whether the order property satisfies the order requirement. Order detection by normalization (ODN) according to this invention first normalizes the two order specification vectors to remove all attributes made redundant by the effects of predicates and functional dependencies. Query execution plans constructed using ODN are found to execute an order of magnitude faster than those constructed using order detection without normalization.
摘要:
A method, apparatus, and article of manufacture for optimizing database queries using a materialized view for a table referenced in the query, wherein the materialized view has different properties than the referenced table. The materialized view may be replicated across multiple processors of the computer system, so that some or all of the query can be executed in a local fashion wherein there is no data movement required to perform the operations. The materialized view also may be partitioned across multiple processors of the computer system using a different partitioning key than the referenced table. The materialized view may be a vertical and/or horizontal subset of the table, so that only selected columns and/or tuples from the table are present therein. Columns may be added to the materialized view to contain pre-computed results of complex expressions, and indices may be created on the columns of the materialized view.
摘要:
A relational data base management system includes a query processor that uses a query operator partition property to perform QEP pruning and to ensure that data input to a query operator is partitioned appropriately for the operation. The partition property indicates the group of network nodes across which a table is distributed. The query processor also makes use of partition classes that are designated "interesting classes" to perform preoptimization planning and query pruning, and to perform look-ahead partitioning based on partition classes that are identified as being of interest to future operations, thereby more efficiently evaluating complex query statements in an MPP, shared-nothing environment.
摘要:
Data in materialized query tables (MQTs) are used as statistics for determining the optimal execution plan for a query. When an MQT is defined, it is examined to determine whether its data provides statistics for determining an optimal execution plan for a query. If so, then the MQT is identified, in the RDBMS, as a source for statistics. Information needed to exploit the MQT data as statistics is cataloged in the RDBMS. This information includes a characterization of the type of statistics provided by the MQT, the table and column distributions represented by those statistics, and a query for later retrieving relevant data from the MQT during the query optimization process. When a query is accepted for execution, the cataloged relevant information about MQTs is examined to determine whether an MQT exists that provides statistics relevant to optimization of the query. If such an MQT exists, then the relevant data is retrieved from the MQT using the cataloged query. Using the retrieved statistics, an optimal execution plan may be determined for the query.