摘要:
A database object summarization tool is provided that selects a subset of database objects subject to filtering constraints such as a partial order or optimization of some attribute. A dominance primitive filters out tuples that are dominated according to a partial order constraint by another tuple. A representation primitive selects a representative subset of tuples such than an optimization criteria is met.
摘要:
A method of estimating the Results of a database query are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. And, can aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.
摘要:
A method for estimating the result of a query on a database having data records arranged in tables. The database has an expected workload that includes a set of queries that can be executed on the database. An expected workload is derived comprising a set of queries that can be executed on the database. A sample is constructed by selecting data records for inclusion in the sample in a manner that minimizes an estimation error when the data records are acted upon by a query in the expected workload to provide an expected workload to provide an expected result. The query accesses the sample and is executed on the sample, returning an estimated query result. The expected workload can be constructed by specifying a degree of overlap between records selected by queries in the given workload and records selected by queries in the expected workload.
摘要:
An index selection tool helps reduce costs in time and memory in selecting an index configuration or set of indexes for use by a database server in accessing a database in accordance with a workload of queries. The index selection tool attempts to reduce the number of indexes to be considered, the number of index configurations to be enumerated, and the number of invocations of a query optimizer in selecting an index configuration for the workload.
摘要:
A database server may be configured to compute distinct page counts of pages accessed to execute operands of respective queries. The queries may be executed against a table comprised of the pages and having an index managed by the database server. The distinct page counts may be obtained by counting, as a part of the executing of the queries, distinct pages accessed during the execution of the queries.
摘要:
A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records, such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.
摘要:
An outlier index for a database and a given workload is generated by identifying sub-relations of tuples in the database induced by selection and group by conditions in queries in the workload. A variance is then generated for values in each sub-relation. Sub-relations having higher variances are selected, and outliers from such sub-relations having higher variances are generated.
摘要:
Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.
摘要:
A query progress indicator that provides an indication to a user of the progress of a query being executed on a database. The indication of the progress of the query allows the user to decide whether the query should be allowed to complete or should be aborted. One method that may be used to estimate the progress of a query that is being executed on a database defines a model of work performed during execution of a query. The total amount of work that will be performed during execution of the query is estimated according to the model. The amount of work performed at a given point during execution of the query is estimated according to the model. The progress of the query is estimated using the estimated amount of work at the given point in time and the estimated total amount of work. This estimated progress of query execution may be provided to the user.
摘要:
A method for estimating the result of a query on a database having data records arranged in tables. The database has an expected workload that includes a set of queries that can be executed on the database. An expected workload is derived comprising a set of queries that can be executed on the database. A sample is constructed by selecting data records for inclusion in the sample in a manner that minimizes an estimation error when the data records are acted upon by a query in the expected workload to provide an expected workload to provide an expected result. The query accesses the sample and is executed on the sample, returning an estimated query result. The expected workload can be constructed by specifying a degree of overlap between records selected by queries in the given workload and records selected by queries in the expected workload.