摘要:
The present invention leverages machine learning techniques to provide automatic generation of conditioning variables for constructing a data perspective for a given target variable. The present invention determines and analyzes the best target variable predictors for a given target variable, employing them to facilitate the conveying of information about the target variable to a user. It automatically discretizes continuous and discrete variables utilized as target variable predictors to establish their granularity. In other instances of the present invention, a complexity and/or utility parameter can be specified to facilitate generation of the data perspective via analyzing a best target variable predictor versus the complexity of the conditioning variable(s) and/or utility. The present invention can also adjust the conditioning variables (i.e., target variable predictors) of the data perspective to provide an optimum view and/or accept control inputs from a user to guide/control the generation of the data perspective.
摘要:
The present invention leverages curve fitting data techniques to provide automatic detection of data anomalies in a “data tube” from a data perspective, allowing, for example, detection of data anomalies such as on-screen, drill down, and drill across data anomalies in, for example, pivot tables and/or OLAP cubes. It determines if data substantially deviates from a predicted value established by a curve fitting process such as, for example, a piece-wise linear function applied to the data tube. A threshold value can also be employed by the present invention to facilitate in determining a degree of deviation necessary before a data value is considered anomalous. The threshold value can be supplied dynamically and/or statically by a system and/or a user via a user interface. Additionally, the present invention provides an indication to a user of the type and location of a detected anomaly from a top level data perspective.
摘要:
The present invention leverages curve fitting data techniques to provide automatic detection of data anomalies in a “data tube” from a data perspective, allowing, for example, detection of data anomalies such as on-screen, drill down, and drill across data anomalies in, for example, pivot tables and/or OLAP cubes. It determines if data substantially deviates from a predicted value established by a curve fitting process such as, for example, a piece-wise linear function applied to the data tube. A threshold value can also be employed by the present invention to facilitate in determining a degree of deviation necessary before a data value is considered anomalous. The threshold value can be supplied dynamically and/or statically by a system and/or a user via a user interface. Additionally, the present invention provides an indication to a user of the type and location of a detected anomaly from a top level data perspective.
摘要:
The present invention utilizes a cross-prediction scheme to predict values of discrete and continuous time observation data, wherein conditional variance of each continuous time tube variable is fixed to a small positive value. By allowing cross-predictions in an ARMA based model, values of continuous and discrete observations in a time series are accurately predicted. The present invention accomplishes this by extending an ARMA model such that a first time series “tube” is utilized to facilitate or “cross-predict” values in a second time series tube to form an “ARMAxp” model. In general, in the ARMAxp model, the distribution of each continuous variable is a decision graph having splits only on discrete variables and having linear regressions with continuous regressors at all leaves, and the distribution of each discrete variable is a decision graph having splits only on discrete variables and having additional distributions at all leaves.
摘要:
A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed. The system permits a user to browse through the hierarchy, and, to readily comprehend segment inter-relationships, selectively expand and contract the displayed hierarchy, as desired, as well as to compare two selected segments or segment groups together and graphically display the results of that comparison. An alternative discriminant-based cluster scoring technique is also presented.
摘要:
A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed. The system permits a user to browse through the hierarchy, and, to readily comprehend segment inter-relationships, selectively expand and contract the displayed hierarchy, as desired, as well as to compare two selected segments or segment groups together and graphically display the results of that comparison. An alternative discriminant-based cluster scoring technique is also presented.
摘要:
Systems and methods for determining the value of bids placed by content providers for placement positions on a page, e.g., a web page, rendered according to a given context, for instance, the search results listing for a particular query initiated on a search engine web site, are provided. Additionally, systems and methods are provided for determining placement of content items, e.g., advertisements and/or images, on a rendered page relative to other content items on the page based upon bid value.
摘要:
Decision trees populated with classifier models are leveraged to provide enhanced spam detection utilizing separate email classifiers for each feature of an email. This provides a higher probability of spam detection through tailoring of each classifier model to facilitate in more accurately determining spam on a feature-by-feature basis. Classifiers can be constructed based on linear models such as, for example, logistic-regression models and/or support vector machines (SVM) and the like. The classifiers can also be constructed based on decision trees. “Compound features” based on internal and/or external nodes of a decision tree can be utilized to provide linear classifier models as well. Smoothing of the spam detection results can be achieved by utilizing classifier models from other nodes within the decision tree if training data is sparse. This forms a base model for branches of a decision tree that may not have received substantial training data.
摘要:
The present invention leverages approximations of distributions to provide tractable variational approximations, based on at least one continuous variable, for inference utilization in Bayesian networks where local distributions are decision-graphs. These tractable approximations are employed in lieu of exact inferences that are normally NP-hard to solve. By utilizing Jensen's inequality applied to logarithmic distributions composed of a generalized sum including an introduced arbitrary conditional distribution, a means is acquired to resolve a tightly bound likelihood distribution. The means includes application of Mean-Field Theory, approximations of conditional probability distributions, and/or other means that allow for a tractable variational approximation to be achieved.
摘要:
Systems and methods for determining the value of bids placed by content providers for placement positions on a page, e.g., a web page, rendered according to a given context, for instance, the search results listing for a particular query initiated on a search engine web site, are provided. Additionally, systems and methods are provided for determining placement of content items, e.g., advertisements and/or images, on a rendered page relative to other content items on the page based upon bid value.