摘要:
Exemplary embodiments of the present invention relate to enhanced faceted search support for OLAP queries over unstructured text as well as structured dimensions by the dynamic and automatic discovery of dimensions that are determined to be most “interesting” to a user based upon the data. Within the exemplary embodiments “interestingness” is defined as how surprising a summary along some dimensions is from a user's expectation. Further, multi-attribute facets are determined and a user is optionally permitted to specify the distribution of values that she expects, and/or the distance metric by which actual and expected distributions are to be compared.
摘要:
Exemplary embodiments of the present invention relate to enhanced faceted search support for OLAP queries over unstructured text as well as structured dimensions by the dynamic and automatic discovery of dimensions that are determined to be most “interesting” to a user based upon the data. Within the exemplary embodiments “interestingness” is defined as how surprising a summary along some dimensions is from a user's expectation. Further, multi-attribute facets are determined and a user is optionally permitted to specify the distribution of values that she expects, and/or the distance metric by which actual and expected distributions are to be compared.
摘要:
Exemplary embodiments of the present invention relate to enhanced faceted search support for OLAP queries over unstructured text as well as structured dimensions by the dynamic and automatic discovery of dimensions that are determined to be most “interesting” to a user based upon the data. Within the exemplary embodiments “interestingness” is defined as how surprising a summary along some dimensions is from a user's expectation. Further, multi-attribute facets are determined and a user is optionally permitted to specify the distribution of values that she expects, and/or the distance metric by which actual and expected distributions are to be compared.
摘要:
A system for automating data partitioning in a parallel database includes plural nodes connected in parallel. Each node includes a database server and two databases connected thereto. Each database server includes a query optimizer. Moreover, a partitioning advisor communicates with the database server and the query optimizer. The query optimizer and the partitioning advisor include a program for recommending and evaluating data table partitions that are useful for processing a workload of query statements. The data table partitions are recommended and evaluated without requiring the data tables to be physically repartitioned.
摘要:
A method and system for automatically and adaptively determining query execution plans for parametric queries. A first classifier trained by an initial set of training points is generated. A query workload and/or database statistics are dynamically updated. A new set of training points is collected off-line. Using the new set of training points, the first classifier is modified into a second classifier. A database query is received at a runtime subsequent to the off-line phase. The query includes predicates having parameter markers bound to actual values. The predicates are associated with selectivities. A mapping of the selectivities into a plan determines the query execution plan. The determined query execution plan is included in an augmented set of training points, where the augmented set includes the initial set and the new set.
摘要:
A system for automating data partitioning in a parallel database includes plural nodes connected in parallel. Each node includes a database server and two databases connected thereto. Each database server includes a query optimizer. Moreover, a partitioning advisor communicates with the database server and the query optimizer. The query optimizer and the partitioning advisor include a program for recommending and evaluating data table partitions that are useful for processing a workload of query statements. The data table partitions are recommended and evaluated without requiring the data tables to be physically repartitioned.
摘要:
A method for automatically and adaptively determining query execution plans for parametric queries. A first classifier trained by an initial set of training points is generated using a set of random decision trees (RDTs). A query workload and/or database statistics are dynamically updated. A new set of training points collected off-line is used to modify the first classifier into a second classifier. A database query is received at a runtime subsequent to the off line phase. The query includes predicates having parameter markers bound to actual values. The predicates are associated with selectivities. The query execution plan is determined by identifying an optimal average of posterior probabilities obtained across a set of RDTs and mapping the selectivities to a plan. The determined query execution plan is included in an augmented set of training points that includes the initial set and the new set.
摘要:
A system for automating data partitioning in a parallel database includes plural nodes connected in parallel. Each node includes a database server and two databases connected thereto. Each database server includes a query optimizer. Moreover, a partitioning advisor communicates with the database server and the query optimizer. The query optimizer and the partitioning advisor include a program for recommending and evaluating data table partitions that are useful for processing a workload of query statements. The data table partitions are recommended and evaluated without requiring the data tables to be physically repartitioned.
摘要:
A method and system for automatically and adaptively determining query execution plans for parametric queries. A first classifier trained by an initial set of training points is generated. A query workload and/or database statistics are dynamically updated. A new set of training points is collected off-line. Using the new set of training points, the first classifier is modified into a second classifier. A database query is received at a runtime subsequent to the off-line phase. The query includes predicates having parameter markers bound to actual values. The predicates are associated with selectivities. A mapping of the selectivities into a plan determines the query execution plan. The determined query execution plan is included in an augmented set of training points, where the augmented set includes the initial set and the new set.
摘要:
The present invention provides a method, system and program for optimizing compression of a workload processed by a database management system. In an embodiment of the present invention a method of optimizing the compression of database workloads is provided. Initially, an estimate of a cost of execution for each query according to a defined metric such as execution time or memory consumption is determined. A sub-set of queries is then selected from the workload in order of the most costly to least costly relative to the defined metric for compression according to either a predetermined compression threshold percentage or a threshold percentage derived from an allotted workload execution time. Compression is then performed on the selected sub-set of queries (i.e. those that will benefit the most from the compression) to achieve a net beneficial trade-off between the cost of workload compression and the cost of workload execution.