摘要:
A method for evaluating a user query on a database having a mining model that classifies records contained in the database into classes when the query comprises at least one mining predicate that refers to a class of database records. An upper envelope is derived for the class referred to by the mining predicate corresponding to a query that returns a set of database records that includes all of the database records belonging to the class. The upper envelope is included in the user query for query evaluation. The method may be practiced during a preprocessing phase by evaluating the mining model to extract a set of classes of the database records and deriving an upper envelope for each class. These upper envelopes are stored for access during user query evaluation.
摘要:
A system and method for explaining why an exceptional element in a multidimensional database is exceptional by presenting the element using at least two of the dimensions responsible for the exception. Maximal terms are identified in the monolithic equation that is used to identify exceptions, and based on the maximal terms the dimensions that are to be displayed are selected as a visual indication of why a displayed element is exceptional.
摘要:
Disclosed is a system and method for performing database queries including GROUP-BY operations, in which aggregate values for attributes are desired for distinct, partitioned subsets of tuples satisfying a query. A special case of the aggregation problem is addressed, employing a structure, called the data cube operator, which provides information useful for expediting execution of GROUP-BY operations in queries. Algorithms are provided for constructing the data cube by efficiently computing a collection of GROUP-BYs on the attributes of the relation. Decision support systems often require computation of multiple GROUP-BY operations on a given set of attributes, the GROUP-BYs being related in the sense that their attributes are subsets or supersets of each other. The invention extends hash-based and sort-based grouping methods with optimizations, including combining common operations across multiple GROUP-BYs and using pre-computed GROUP-BYs for computing other GROUP-BYs. An extension of the cube algorithms handles any given collection of aggregates.
摘要:
A user can easily organize computerized document folders by associating a few sample documents in the document database with each folder. The present invention learns folder profiles based on the sample documents and moves the remaining documents into the folders accordingly. In this way, the user can construct new folders, or rearrange existing folders, or cause the computer to automatically rearrange and maintain the folders. This is particularly useful for managing a database of perhaps thousands of emails.
摘要:
A system and method for data mining is provided in which temporal patterns of itemsets in transactions having unexpected support values are identified. A surprising temporal pattern is an itemset whose support changes over time. The method may use a minimum description length formulation to discover these surprising temporal patterns.
摘要:
A method and apparatus for mining data relationships from an integrated database and data-mining system are disclosed. A set of frequent 1-itemsets is generated using a group-by query on data transactions. From these frequent 1-itemsets and the transactions, frequent 2-itemsets are determined. A candidate set of (n+2)-itemsets are generated from the frequent 2-itemsets, where n=1. Frequent (n+2)-itemsets are determined from candidate set and the transaction table using a query operation. The candidate set and frequent (n+2)-itemset are generated for (n+1) until the candidate set is empty. Rules are then extracted from the union of the determined frequent itemsets.
摘要:
A method for locating data anomalies in a k dimensional data cube that includes the steps of associating a surprise value with each cell of a data cube, and indicating a data anomaly when the surprise value associated with a cell exceeds a predetermined exception threshold. According to one aspect of the invention, the surprise value associated with each cell is a composite value that is based on at least one of a Self-Exp value for the cell, an In-Exp value for the cell and a Path-Exp value for the cell. Preferably, the step of associating the surprise value with each cell includes the steps of determining a Self-Exp value for the cell, determining an In-Exp value for the cell, determining a Path-Exp value for the cell, and then generating the surprise value for the cell based on the Self-Exp value, the In-Exp value and the Path-value.