摘要:
A novel method is employed for collecting optimizer statistics for optimizing database queries by gathering feedback from the query execution engine about the observed cardinality of predicates and constructing and maintaining multidimensional histograms. This makes use of the correlation between data columns without employing an inefficient data scan. The maximum entropy principle is used to approximate the true data distribution by a histogram distribution that is as “simple” as possible while being consistent with the observed predicate cardinalities. Changes in the underlying data are readily adapted to, automatically detecting and eliminating inconsistent feedback information in an efficient manner. The size of the histogram is controlled by retaining only the most “important” feedback.
摘要:
The present invention provides a method of selectivity estimation in which preprocessing steps improve the feasibility and efficiency of the estimation. The preprocessing steps are partitioning (to make iterative scaling estimation terminate in a reasonable time for even large sets of predicates), forced partitioning (to enable partitioning in case there are no “natural” partitions, by finding the subsets of predicates to create partitions that least impact the overall solution); inconsistency resolution (in order to ensure that there always is a correct and feasible solution), and implied zero elimination (to ensure convergence of the iterative scaling computation under all circumstances). All of these preprocessing steps make a maximum entropy method of selectivity estimation produce a correct cardinality model, for any kind of query with conjuncts of predicates. In addition, the preprocessing steps can also be used in conjunction with prior art methods for building a cardinality model.
摘要:
A method and system for automatically and adaptively determining query execution plans for parametric queries. A first classifier trained by an initial set of training points is generated. A query workload and/or database statistics are dynamically updated. A new set of training points is collected off-line. Using the new set of training points, the first classifier is modified into a second classifier. A database query is received at a runtime subsequent to the off-line phase. The query includes predicates having parameter markers bound to actual values. The predicates are associated with selectivities. A mapping of the selectivities into a plan determines the query execution plan. The determined query execution plan is included in an augmented set of training points, where the augmented set includes the initial set and the new set.
摘要:
A method for consistent selectivity estimation based on the principle of maximum entropy (ME) is provided. The method efficiently exploits all available information and avoids the bias problem. In the absence of detailed knowledge, the ME approach reduces to standard uniformity and independence assumptions. The disclosed method, based on the principle of ME, is used to improve the optimizer's cardinality estimates by orders of magnitude, resulting in better plan quality and significantly reduced query execution times.
摘要:
A method for consistent selectivity estimation based on the principle of maximum entropy (ME) is provided. The method efficiently exploits all available information and avoids the bias problem. In the absence of detailed knowledge, the ME approach reduces to standard uniformity and independence assumptions. The disclosed method, based on the principle of ME, is used to improve the optimizer's cardinality estimates by orders of magnitude, resulting in better plan quality and significantly reduced query execution times.
摘要:
A method and system for automatically and adaptively determining query execution plans for parametric queries. A first classifier trained by an initial set of training points is generated. A query workload and/or database statistics are dynamically updated. A new set of training points is collected off-line. Using the new set of training points, the first classifier is modified into a second classifier. A database query is received at a runtime subsequent to the off-line phase. The query includes predicates having parameter markers bound to actual values. The predicates are associated with selectivities. A mapping of the selectivities into a plan determines the query execution plan. The determined query execution plan is included in an augmented set of training points, where the augmented set includes the initial set and the new set.
摘要:
A method for consistent selectivity estimation based on the principle of maximum entropy (ME) is provided. The method efficiently exploits all available information and avoids the bias problem. In the absence of detailed knowledge, the ME approach reduces to standard uniformity and independence assumptions. The disclosed method, based on the principle of ME, is used to improve the optimizer's cardinality estimates by orders of magnitude, resulting in better plan quality and significantly reduced query execution times.
摘要:
A method for automatically and adaptively determining query execution plans for parametric queries. A first classifier trained by an initial set of training points is generated using a set of random decision trees (RDTs). A query workload and/or database statistics are dynamically updated. A new set of training points collected off-line is used to modify the first classifier into a second classifier. A database query is received at a runtime subsequent to the off line phase. The query includes predicates having parameter markers bound to actual values. The predicates are associated with selectivities. The query execution plan is determined by identifying an optimal average of posterior probabilities obtained across a set of RDTs and mapping the selectivities to a plan. The determined query execution plan is included in an augmented set of training points that includes the initial set and the new set.
摘要:
There is disclosed a data processing system implemented method, a data processing system, and an article of manufacture for directing a data processing system to maintain a database table associated with an initial maintenance scheduling interval. The data processing system implemented method includes selecting a randomizing factor, and selecting a new maintenance scheduling interval for the database table based on the initial maintenance scheduling interval and the selected randomizing factor.
摘要:
There is disclosed a data processing system implemented method, a data processing system, and an article of manufacture for directing a data processing system to maintain a database table associated with an initial maintenance scheduling interval. The data processing system implemented method includes selecting a randomizing factor, and selecting a new maintenance scheduling interval for the database table based on the initial maintenance scheduling interval and the selected randomizing factor.