Abstract:
Online physical design tuning is constantly monitoring database indexes and can effectively react to changes in a workload by modifying the physical design as needed. Algorithms can be utilized that take into account various criteria including storage constraints, update statements, and the cost of temporarily creating physical structures.
Abstract:
A system and methods rank results of database queries. An automated approach for ranking database query results is disclosed that leverages data and workload statistics and associations. Ranking functions are based upon the principles of probabilistic models from Information Retrieval that are adapted for structured data. The ranking functions are encoded into an intermediate knowledge representation layer. The system is generic, as the ranking functions can be further customized for different applications. Benefits of the disclosed system and methods include the use of adapted probabilistic information retrieval (PIR) techniques that leverage relational/structured data, such as columns, to provide natural groupings of data values. This permits the inference and use of pair-wise associations between data values across columns, which are usually not possible with text data.
Abstract:
A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.
Abstract:
Relational database applications such as index selection, histogram tuning, approximate query processing, and statistics selection have recognized the importance of leveraging workloads. Often these applications are presented with large workloads, i.e., a set of SQL DML statements, as input. A key factor affecting the scalability of such applications is the size of the workload. The invention concerns workload compression which helps improve the scalability of such applications. The exemplary embodiment is broadly applicable to a variety of workload-driven applications, while allowing for incorporation of application specific knowledge. The process is described in detail in the context of two workload-driven applications: index selection and approximate query processing.
Abstract:
A method for estimating the result of a query on a database having data records arranged in tables. The database has an expected workload that includes a set of queries that can be executed on the database. An expected workload is derived including a set of queries that can be executed on the database. A sample is constructed by selecting data records for inclusion in the sample in a manner that minimizes an estimation error when the data records are acted upon by a query in the expected workload to provide an expected workload to provide an expected result. The query accesses the sample and is executed on the sample, returning an estimated query result. The expected workload can be constructed by specifying a degree of overlap between records selected by queries in the given workload and records selected by queries in the expected workload.
Abstract:
The claimed subject matter relates to incorporating a skyline operator within a relational database engine, and more particularly to a database engine that utilizes novel techniques to determine the lowest cost of generating the skyline produced by the skyline operator. The database engine receives queries and associated preferences and based on a cardinality estimate and a cost estimate an appropriate skyline generating technique is utilized to produce a skyline representative of the received queries and its associated preferences.
Abstract:
A query generation using cardinality constraints process including choosing a first set of parameters for a query, calculating an additional set of parameters based on the first set of parameters, executing the query using additional set of parameters, evaluating the cardinality error the additional set of parameters, and refining the additional set of parameters to meet the desired cardinality constraint. Creating a query and selecting parameters for the query to meet a desired cardinality constraint or set of cardinality constraints when the query is executed against a database may be difficult. A query generation using cardinality constraints process may create a set of parameters for a query which satisfies a desired cardinality constraint or set of cardinality constraints. An application of such a query generation using cardinality constraints process may be database component and code testing.
Abstract:
A method for automatically ranking database records by relevance to a given query. A similarity function is derived from data in the database and/or queries in a workload. The derrived similarity function is applied to a given query and records it in the database to rank the records. The records are returned in a ranked order.
Abstract:
Database system query optimizers use several techniques such as histograms and sampling to estimate the result sizes of operators and sub-plans (operator trees) and the number of distinct values in their outputs. Instead of estimates, the invention uses the exact actual values of the result sizes and the number of distinct values in the outputs of sub-plans encountered by the optimizer. This is achieved by optimizing the query in phases. In each phase, newly encountered sub-plans are recorded for which result size and/or distinct value estimates are required. These sub-plans are executed at the end of the phase to determine their actual result sizes and the actual number of distinct values in their outputs. In subsequent phases, the optimizer uses these actual values when it encounters the same sub-plan again.
Abstract:
A method is provided for tuning a database to recommend a set of physical design structures for the database that optimize database performance for a given workload given a total time bound that defines a maximum amount of time that can be spent tuning the database. A cumulative set of recommended structures is maintained and incrementally updated based on tuning that is performed in intervals over portions of the workload. The cumulative set of recommended structures is updated by tuning the database by examining a predetermined portion of the workload during a time slice that is a fraction of the total time bound. At the end of the time slice, a set of recommended structures has been enumerated that is based on the workload portions that have been examined thus far. The set of recommended structures is updated until all queries in the workload have been examined or until the time bound is reached.