Abstract:
A database object summarization tool is provided that selects a subset of database objects subject to filtering constraints such as a partial order or optimization of some attribute. A dominance primitive filters out tuples that are dominated according to a partial order constraint by another tuple. A representation primitive selects a representative subset of tuples such than an optimization criteria is met.
Abstract:
A method for evaluating a user query on a relational database having records stored therein, a workload made up of a set of queries that have been executed on the database, and a query optimizer that generates a query execution plan for the user query. Each query plan includes a plurality of intermediate query plan components that verify a subset of records from the database meeting query criteria. The method accesses the query plan and a set of stored intermediate statistics for records verified by query components, such as histograms that summarize the cardinality of the records that verify the query component. The method forms a transformed query plan based on the selected intermediate statistics (possibly by rewriting the query plan) and estimates the cardinality of the transformed query plan to arrive at a more accurate cardinality estimate for the query. If additional intermediate statistics are necessary, a pool of intermediate statistics may be generated based on the queries in the workload by evaluating the benefit of a given statistic over the workload and adding intermediate statistics to the pool that provide relatively great benefit.
Abstract:
A framework is provided within a database system for specifying database monitoring rules that will be evaluated as part of the execution code path of database events being monitored. The occurrence of a selected database event triggers a rule that evaluates some parameter of an object related to the event against a condition in the rule. If the condition is met, a specified action is taken that can alter the execution of the database event or database system performance. Lightweight aggregation tables are utilized to enable aggregation of object parameter values so that presently occurring events can be compared to a summary of the object parameter values from previously occurring database events. Signatures are assigned to queries based on the structure of the query plan so that information in the lightweight aggregation tables can be grouped according to query signature.
Abstract:
Relational database applications such as index selection, histogram tuning, approximate query processing, and statistics selection have recognized the importance of leveraging workloads. Often these applications are presented with large workloads, i.e., a set of SQL DML statements, as input. A key factor affecting the scalability of such applications is the size of the workload. The invention concerns workload compression which helps improve the scalability of such applications. The exemplary embodiment is broadly applicable to a variety of workload-driven applications, while allowing for incorporation of application specific knowledge. The process is described in detail in the context of two workload-driven applications: index selection and approximate query processing.
Abstract:
Searching by keywords and providing generalized matching capabilities on a relational database is enabled by performing preprocessing operations to construct inverted list lookup tables based on data record components at an interim level of granularity, such as column location. Prefix information is in the inverted list stored for each keyword, keyword sub-string, or stemmed version of the keyword. A keyword search is performed on the lookup tables rather than the database tables to determine database column locations of the keyword. The lookup tables is scanned to identify each prefix associated with the search term. Schema information about the database is used to link the column locations to form database subgraphs that span the keywords. Join tables are to generated based on the subgraphs consisting of columns containing the keywords. A query on the database is generated to join the tables and retrieve database rows that contain the keyword and the prefixes associated with the keyword. The retrieved rows are ranked in order of relevance before being output. By preprocessing a relational database to form lookup tables, and initially searching the lookup tables to obtain a targeted subset of the database upon which SQL queries can be performed to collect data records, keyword searching on relational database is made efficient.
Abstract:
Building histograms by using feedback information about the execution of query workload rather than by examining the data helps reduce the cost of building and maintaining histograms. A method of maintaining self-tuning histograms updates histograms based on feedback about the execution of a user query. A histogram may be initialized using an assumption of uniform distribution of data or by combining existing histograms. A histogram tuner accesses and estimated result in response to a user query generated by using the histogram. The histogram tuner calculates an estimation error based on the result of the user query and the estimated result. The frequencies of histogram buckets are refined based on the estimation error. The bucket bounds of the histogram are restructured based on the refined frequencies. The method may be performed on-line after a user query or off-line by accessing a workload log. By updating a histogram without accessing the database, the cost of building and maintaining histograms is significantly reduced.
Abstract:
What-if index analysis utility provides the ability to analyze the performance of the existing configuration of a database system with respect to one or more workloads of queries and to propose a hypothetical configuration for the database system to analyze its potential impact on the performance of the database system. The utility may be used, for example, to perform an impact analysis of the set of indexes selected by an index selection tool, for example, with respect to a workload of queries and may also be used to explore what-if scenarios for the database system by analyzing the impact of hypothetical sets of indexes with respect to the execution of various workloads over projected sizes of a database. The utility may be used to perform summarizations of workloads, configurations, and the performance of workloads with respect to the existing configuration and hypothetical configurations. What-if index analysis utility may be used, for example, by a database administrator or a physical database design tool to help improve performance of a database system.
Abstract:
Database applications typically need to invoke foreign functions or to access data that is not stored in the database. The invention provides a comprehensive approach to cost-based optimization of relational queries in the presence of such foreign functions. The optimization takes into account semantic information about foreign functions using a declarative rule language (e.g., SQL) to express such semantics. Procedures for applying the rewrite rules and for generating the execution space of equivalent queries are described. Procedures to obtain an optimal plan from this enriched execution space are also described. Moreover, necessary extensions to the cost model that are needed in the presence of foreign functions are described.
Abstract:
A plurality of description phrases associated with a first domain may be determined, based on an analysis of a first plurality of documents to determine co-occurrences of the description phrases with one or more name labels associated with the first domain. An entity associated with the first domain may be obtained. An analysis of a second plurality of documents may be initiated to identify co-occurrences of mentions of the obtained entity and one or more of the plurality of description phrases, and contexts associated with each of the co-occurrences of the mentions and description phrases, in each one of the second plurality of documents. A description tag association between the obtained entity and one of the description phrases may be determined, based on an analysis of the identified contexts.
Abstract:
A set of documents is filtered for entity extraction. A list of entity strings is received. A set of token sets that covers the entity strings in the list is determined. An inverted index generated on a first set of documents is queried using the set of token sets to determine a set of document identifiers for a subset of the documents in the first set. A second set of documents identified by the set of document identifiers is retrieved from the first set of documents. The second set of documents is filtered to include one or more documents of the second set that each includes a match with at least one entity string of the list of entity strings. Entity recognition may be performed on the filtered second set of documents.