摘要:
There is disclosed a data processing system implemented method, a data processing system, and an article of manufacture for directing a data processing system to maintain a database table associated with an initial maintenance scheduling interval. The data processing system implemented method includes selecting a randomizing factor, and selecting a new maintenance scheduling interval for the database table based on the initial maintenance scheduling interval and the selected randomizing factor.
摘要:
A method for predictable query execution through early materialization is provided. The method deals with the problem of cardinality misestimation in query execution plans, by pre-executing sub-plans on a query execution plan that have questionable estimates and collecting statistics on the output of these sub-plans. If needed, the overall query execution plan is changed in light of these statistics, before optimizing and executing the remainder of the query.
摘要:
Disclosed are a method and a system for executing a query that requires an expensive process, such as a join, between two or more datasets. If each dataset has multiple partitions that are located at multiple sources, then each of the multiple partitions for each dataset must be unioned prior to completing execution of the query. The method and system develop both a query execution plan and at least one alternative query execution plan to indicate when the process should be pushed down below the unions and when the process should be pulled up above the unions based on collocation of partitions. The query execution plan and the alternative query execution plan(s) are embedded in a composite query execution plan which is evaluated and re-evaluated at run time to determine which of the query execution plan and the alternative query execution plan is currently the most efficient plan and the query is executed, accordingly.
摘要:
“Determining Validity Ranges of Query Plans Based on Suboptimality” A method for approximating a validity range for a domain of cardinalities of input to an optimal query plan is provided. Such a validity range is iteratively approximated using a modified Newton-Raphson method to find roots of cost functions for optimal and alternative query plans, respectively. The Newton-Raphson method is combined with a method of incrementing roots of cost functions, known as input cardinalities, such that discontinuous and non-differentiable points in cost functions are avoided. In this manner, input cardinalities remain within a domain for which a valid range can be specified. Additionally, a robustness measure is determined by a sensitivity analysis performed on an approximated validity range. Using a robustness measure provided by a sensitivity analysis and resultant validity range and, query plan sub-optimality detection is simplified, re-optimization is selectively triggered, and robustness information is provided to a system or user performing corrective actions.
摘要:
An approach is provided in which a processor receives a scan request to scan data included in a data table. The processor selects a column in the data table corresponding to the scan request and retrieves column data entries from the selected column. In addition, the processor identifies the width of the selected column and selects a scan algorithm based upon the identified column width. In turn, the processor loads the column data entries into column data vectors and computes scan results from the column data vectors using the selected scan algorithm.
摘要:
A cell-specific dictionary is applied adaptively to adequate cells, where the cell-specific dictionary subsequently optimizes the handling of frequency-partitioned multi-dimensional data. This includes improved data partitioning with super cells or adjusting resulting cells by sub-dividing very large cells and merging multiple small cells, both of which avoid the highly skewed data distribution in cells and improve the query processing. In addition, more efficient encoding is taught within a cell in case the distinct values that actually appear in that cell are much smaller than the size of the column dictionary.
摘要:
According to one embodiment of the present invention, a method for dictionary encoding data without using three-valued logic is provided. According to one embodiment of the invention, a method includes encoding data in a database table using a dictionary, wherein the data includes values representing NULLs. A query having a predicate is received and the predicate is evaluated on the encoded data, whereby the predicate is evaluated on both the encoded data and on the encoded NULLs.
摘要:
A method is disclosed that places data-intensive subprocesses in close physical and logical proximity to the facility responsible for storing the data, so that high efficiencies at reduced cost are achieved. In one specific example, new computer programs, termed adjuncts, are added and placed in a logical partition on a storage facility so that they can be invoked using appropriate commands issued on the I/O channel. Further, programs or changes are added to existing programs on the host machine, wherein such programs or changes discover the function extensions and invoke them to perform data processing.
摘要:
A way for progressively refining a query execution plan during query execution in a federated data system is provided. Re-optimization constraints are placed in the query execution plan during query compilation. When a re-optimization constraint is violated during query execution, a model of the query execution plan is refined using a partially executed query to form a new query execution plan. The new query execution plan is compiled. The compiled new query execution plan is executed.
摘要:
A system, method, and program storage device implementing the method, for integrating data in a database management system, wherein the method comprises grouping data sources and replicas of the data sources that provide analogous data into a common logical domain; writing application queries against the common logical domain; selecting a correct set of replicas of the data sources and a query-execution strategy for combining a content of the correct set of replicas of the data sources in order to answer the application queries according to query-cost-based optimization; selecting a correct set of data sources according to run-time constraints; shielding the application queries from changes to the data sources by dynamically binding the application queries against the correct sets of data sources and replicas of the data sources; and processing the application queries by generating an optimum query result based on the steps of grouping and shielding.