摘要:
An automated database tuning tool may include a user interface component and a tuning engine. The user interface may be a graphical component that interfaces with a user to collect configuration parameters for a tuning session for a specified database. The configuration parameters may be stored in a tuning database. A tuning engine that performs the actual tuning process may generate physical design recommendations and reports. The recommendations and reports may be stored in the tuning database, enabling the tuning tool to be run off-line or in background. Communication between components of the tuning tool may occur via stored procedures.
摘要:
Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.
摘要:
Relational database applications such as index selection, histogram tuning, approximate query processing, and statistics selection have recognized the importance of leveraging workloads. Often these applications are presented with large workloads, i.e., a set of SQL DML statements, as input. A key factor affecting the scalability of such applications is the size of the workload. The invention concerns workload compression which helps improve the scalability of such applications. The exemplary embodiment is broadly applicable to a variety of workload-driven applications, while allowing for incorporation of application specific knowledge. The process is described in detail in the context of two workload-driven applications: index selection and approximate query processing.
摘要:
An index merge tool helps form, for use by a database server in accessing a database in accordance with a workload of queries, an index configuration or set of indexes that consumes relatively less storage space. The index merge tool identifies from an initial set of indexes one or more combinations of two or more indexes on the same table of the database and merges each identified combination of indexes to form a merged set of indexes. The index merge tool identifies and merges each combination of indexes by identifying and merging one pair of indexes at a time. The index merge tool uses the merged set of indexes as the index configuration for use in executing queries against the database so long as the storage saved by the merged set of indexes exceeds a threshold amount and so long as any increase in the cost to execute queries against the database using the merged set of indexes is limited. Otherwise, the index merge tool uses the initial set of indexes as the index configuration.
摘要:
An index selection tool helps reduce costs in time and memory in selecting an index configuration or set of indexes for use by a database server in accessing a database in accordance with a workload of queries. The index selection tool attempts to reduce the number of indexes to be considered, the number of index configurations to be enumerated, and the number of invocations of a query optimizer in selecting an index configuration for the workload.
摘要:
A plurality of indicators representing a plurality of respective candidate database configurations may be obtained, each of the candidate database configurations including a plurality of database queries and a plurality of candidate database indexes associated with a database table. A portion of the candidate database indexes included in the plurality of database indexes may be selected based on skyline selection. An enumeration of the portion of the plurality of the candidate database indexes may be determined based on a greedy algorithm.
摘要:
A database server may be configured to compute distinct page counts of pages accessed to execute operands of respective queries. The queries may be executed against a table comprised of the pages and having an index managed by the database server. The distinct page counts may be obtained by counting, as a part of the executing of the queries, distinct pages accessed during the execution of the queries.
摘要:
A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records, such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.
摘要:
Integrating the partitioning of physical design structures with the physical design process can result in more efficient query execution. When candidate structures are evaluated for their relative benefit, one or more partitioning methods is associated with each structure so that the benefits of various partitioning methods are taken into consideration when the structures are selected for use by the database. A pool of partitioned candidate structures is formed by proposing and evaluating the benefit of candidate structures with associated partitioning on a per query basis. The selected partitioned candidates are then used to construct generalized structures with associated partitioning methods that are evaluated for their benefit over the workload. Those generalized structures are added to the pool of partitioned candidate structures. From this augmented pool of partitioned candidate structures, an optimal set of partitioned structures is enumerated for use by the database system.
摘要:
Layout in a database system is performed using workload information. Execution information for a workload is obtained. Cumulative access and co-access information for database objects is then assembled. A cost model is developed for quantitatively capturing the value of different layouts, and a search is performed for a recommended database layout. In one embodiment, a greedy search is performed which initially attempts provide a layout that minimizes co-location of objects on storage objects, and then attempts to improve that layout via a greedy search.