Automatic data perspective generation for a target variable
    1.
    发明授权
    Automatic data perspective generation for a target variable 有权
    为目标变量生成自动数据透视图

    公开(公告)号:US07225200B2

    公开(公告)日:2007-05-29

    申请号:US10824108

    申请日:2004-04-14

    IPC分类号: G06F17/00 G06F7/00

    摘要: The present invention leverages machine learning techniques to provide automatic generation of conditioning variables for constructing a data perspective for a given target variable. The present invention determines and analyzes the best target variable predictors for a given target variable, employing them to facilitate the conveying of information about the target variable to a user. It automatically discretizes continuous and discrete variables utilized as target variable predictors to establish their granularity. In other instances of the present invention, a complexity and/or utility parameter can be specified to facilitate generation of the data perspective via analyzing a best target variable predictor versus the complexity of the conditioning variable(s) and/or utility. The present invention can also adjust the conditioning variables (i.e., target variable predictors) of the data perspective to provide an optimum view and/or accept control inputs from a user to guide/control the generation of the data perspective.

    摘要翻译: 本发明利用机器学习技术来提供用于为给定目标变量构建数据透视图的自动生成调节变量。 本发明确定和分析给定目标变量的最佳目标变量预测变量,使用它们来促进向用户传达关于目标变量的信息。 它自动离散化用作目标变量预测变量的连续和离散变量以确定其粒度。 在本发明的其他实例中,可以规定复杂性和/或效用参数,以通过分析最佳目标变量预测器与调节变量和/或效用的复杂性来促进数据透视的产生。 本发明还可以调整数据透视图的调节变量(即,目标变量预测器),以提供最佳视图和/或接受来自用户的控制输入以指导/控制数据视角的产生。

    Anomaly detection in data perspectives
    2.
    发明授权
    Anomaly detection in data perspectives 失效
    数据透视异常检测

    公开(公告)号:US07065534B2

    公开(公告)日:2006-06-20

    申请号:US10874956

    申请日:2004-06-23

    IPC分类号: G06F7/00 G06F17/00

    摘要: The present invention leverages curve fitting data techniques to provide automatic detection of data anomalies in a “data tube” from a data perspective, allowing, for example, detection of data anomalies such as on-screen, drill down, and drill across data anomalies in, for example, pivot tables and/or OLAP cubes. It determines if data substantially deviates from a predicted value established by a curve fitting process such as, for example, a piece-wise linear function applied to the data tube. A threshold value can also be employed by the present invention to facilitate in determining a degree of deviation necessary before a data value is considered anomalous. The threshold value can be supplied dynamically and/or statically by a system and/or a user via a user interface. Additionally, the present invention provides an indication to a user of the type and location of a detected anomaly from a top level data perspective.

    摘要翻译: 本发明利用曲线拟合数据技术从数据角度提供“数据管”中的数据异常的自动检测,从而允许例如检测诸如屏幕上的数据异常,向下钻取和钻取数据异常的数据异常 例如,枢轴表和/或OLAP多维数据集。 它确定数据是否基本上偏离由曲线拟合处理(例如应用于数据管的分段线性函数)所建立的预测值。 本发明也可以采用阈值,以便在确定数据值被认为是异常之前确定所需的偏差程度。 阈值可以由系统和/或用户经由用户界面动态地和/或静态地提供。 另外,本发明从顶级数据的角度向用户提供了检测到的异常的类型和位置的指示。

    Anomaly detection in data perspectives
    3.
    发明授权
    Anomaly detection in data perspectives 有权
    数据透视异常检测

    公开(公告)号:US07162489B2

    公开(公告)日:2007-01-09

    申请号:US11299539

    申请日:2005-12-12

    IPC分类号: G06F7/00

    摘要: The present invention leverages curve fitting data techniques to provide automatic detection of data anomalies in a “data tube” from a data perspective, allowing, for example, detection of data anomalies such as on-screen, drill down, and drill across data anomalies in, for example, pivot tables and/or OLAP cubes. It determines if data substantially deviates from a predicted value established by a curve fitting process such as, for example, a piece-wise linear function applied to the data tube. A threshold value can also be employed by the present invention to facilitate in determining a degree of deviation necessary before a data value is considered anomalous. The threshold value can be supplied dynamically and/or statically by a system and/or a user via a user interface. Additionally, the present invention provides an indication to a user of the type and location of a detected anomaly from a top level data perspective.

    摘要翻译: 本发明利用曲线拟合数据技术从数据角度提供“数据管”中的数据异常的自动检测,从而允许例如检测诸如屏幕上的数据异常,向下钻取和钻取数据异常的数据异常 例如,枢轴表和/或OLAP多维数据集。 它确定数据是否基本上偏离由曲线拟合处理(例如应用于数据管的分段线性函数)所建立的预测值。 本发明也可以采用阈值,以便在确定数据值被认为是异常之前确定所需的偏差程度。 阈值可以由系统和/或用户经由用户界面动态地和/或静态地提供。 另外,本发明从顶级数据的角度向用户提供了检测到的异常的类型和位置的指示。

    Systems and methods for new time series model probabilistic ARMA
    4.
    发明授权
    Systems and methods for new time series model probabilistic ARMA 有权
    新时间序列模型概率ARMA的系统和方法

    公开(公告)号:US07580813B2

    公开(公告)日:2009-08-25

    申请号:US10463145

    申请日:2003-06-17

    IPC分类号: G06F17/50 G05B23/02

    CPC分类号: G06F17/18

    摘要: The present invention utilizes a cross-prediction scheme to predict values of discrete and continuous time observation data, wherein conditional variance of each continuous time tube variable is fixed to a small positive value. By allowing cross-predictions in an ARMA based model, values of continuous and discrete observations in a time series are accurately predicted. The present invention accomplishes this by extending an ARMA model such that a first time series “tube” is utilized to facilitate or “cross-predict” values in a second time series tube to form an “ARMAxp” model. In general, in the ARMAxp model, the distribution of each continuous variable is a decision graph having splits only on discrete variables and having linear regressions with continuous regressors at all leaves, and the distribution of each discrete variable is a decision graph having splits only on discrete variables and having additional distributions at all leaves.

    摘要翻译: 本发明利用交叉预测方案来预测离散和连续时间观测数据的值,其中每个连续时间管变量的条件方差固定为小的正值。 通过在基于ARMA的模型中允许交叉预测,可以准确预测时间序列中连续和离散观测值。 本发明通过扩展ARMA模型来实现这一目的,使得第一时间序列“管”用于促进或“交叉预测”第二时间序列管中的值以形成“ARMAxp”模型。 一般来说,在ARMAxp模型中,每个连续变量的分布是仅在离散变量上分裂并具有在所有叶上具有连续回归的线性回归的决策图,并且每个离散变量的分布是仅分解为 离散变量,并在所有叶子上具有额外的分布。

    Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications
    5.
    发明授权
    Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications 有权
    用于可视化数据集群和分级集群分类的装置和相关方法

    公开(公告)号:US06742003B2

    公开(公告)日:2004-05-25

    申请号:US09845151

    申请日:2001-04-30

    IPC分类号: G06F1730

    摘要: A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed. The system permits a user to browse through the hierarchy, and, to readily comprehend segment inter-relationships, selectively expand and contract the displayed hierarchy, as desired, as well as to compare two selected segments or segment groups together and graphically display the results of that comparison. An alternative discriminant-based cluster scoring technique is also presented.

    摘要翻译: 一个包含交互式图形用户界面的系统,用于可视化数据的集群(类别)和分段(聚合集群)。 具体来说,系统将传入的病例数据自动分类为群集,将这些群集合成段,确定段的相似性度量,通过相似性度量对所选段进行分类,然后形成并可视地描绘这些群集的层次结构。 基于片段或段组的相似性度量,系统还可以根据需要自动和动态地减少层次组织的深度,通过消除不必要的层级和节点间链接。 倾向于对每个段进行有意义表征的属性/值数据也被划分,基于归一化分数进行排序,然后以图形方式显示。 该系统允许用户浏览层次结构,并且为了容易地理解分段相互关系,根据需要选择性地扩展和收缩所显示的层次结构,以及将两个选定的分段或分段组进行比较,并以图形方式显示 那个比较。 还提出了一种替代的基于判别式的聚类评分技术。

    Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications
    6.
    发明授权
    Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications 有权
    用于可视化数据集群和分级集群分类的装置和相关方法

    公开(公告)号:US07333998B2

    公开(公告)日:2008-02-19

    申请号:US10808064

    申请日:2004-03-24

    IPC分类号: G06F17/30

    摘要: A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed. The system permits a user to browse through the hierarchy, and, to readily comprehend segment inter-relationships, selectively expand and contract the displayed hierarchy, as desired, as well as to compare two selected segments or segment groups together and graphically display the results of that comparison. An alternative discriminant-based cluster scoring technique is also presented.

    摘要翻译: 一个包含交互式图形用户界面的系统,用于可视化数据的集群(类别)和分段(聚合集群)。 具体来说,系统将传入的案例数据自动分类为群集,将这些群集归纳为段,确定段的相似性度量,通过相似性度量对所选段进行分类,然后形成并可视地描绘这些群集的层次结构。 基于片段或段组的相似性度量,系统还可以根据需要自动和动态地减少层次组织的深度,通过消除不必要的层级和节点间链接。 倾向于对每个段进行有意义表征的属性/值数据也被划分,基于归一化分数进行排序,然后以图形方式显示。 该系统允许用户浏览层次结构,并且为了容易地理解分段相互关系,根据需要选择性地扩展和收缩所显示的分层结构,并且将两个选定的分段或分段组进行比较,并以图形方式显示 比较。 还提出了一种替代的基于判别式的聚类评分技术。

    Trees of classifiers for detecting email spam
    8.
    发明授权
    Trees of classifiers for detecting email spam 有权
    用于检测电子邮件垃圾邮件的分类树

    公开(公告)号:US07930353B2

    公开(公告)日:2011-04-19

    申请号:US11193691

    申请日:2005-07-29

    IPC分类号: G06F15/16

    CPC分类号: H04L51/12

    摘要: Decision trees populated with classifier models are leveraged to provide enhanced spam detection utilizing separate email classifiers for each feature of an email. This provides a higher probability of spam detection through tailoring of each classifier model to facilitate in more accurately determining spam on a feature-by-feature basis. Classifiers can be constructed based on linear models such as, for example, logistic-regression models and/or support vector machines (SVM) and the like. The classifiers can also be constructed based on decision trees. “Compound features” based on internal and/or external nodes of a decision tree can be utilized to provide linear classifier models as well. Smoothing of the spam detection results can be achieved by utilizing classifier models from other nodes within the decision tree if training data is sparse. This forms a base model for branches of a decision tree that may not have received substantial training data.

    摘要翻译: 利用分类器模型填充的决策树利用电子邮件的每个功能使用单独的电子邮件分类器来提供增强的垃圾邮件检测。 这通过定制每个分类器模型提供了更高的垃圾邮件检测的概率,以便于在逐个特征的基础上更准确地确定垃圾邮件。 分类器可以基于诸如逻辑回归模型和/或支持向量机(SVM)等线性模型来构建。 分类器也可以基于决策树构建。 基于决策树的内部和/或外部节点的“复合特征”也可以用于提供线性分类器模型。 垃圾邮件检测结果的平滑可以通过使用来自决策树内的其他节点的分类器模型来实现,如果训练数据是稀疏的。 这形成了可能没有接收到大量训练数据的决策树的分支的基本模型。

    Systems and methods for tractable variational approximation for interference in decision-graph Bayesian networks
    9.
    发明授权
    Systems and methods for tractable variational approximation for interference in decision-graph Bayesian networks 失效
    用于决策贝叶斯网络干扰的易变性近似的系统和方法

    公开(公告)号:US07184993B2

    公开(公告)日:2007-02-27

    申请号:US10458166

    申请日:2003-06-10

    IPC分类号: G06F17/00 G06N5/02

    CPC分类号: G06N7/005 G06N5/04

    摘要: The present invention leverages approximations of distributions to provide tractable variational approximations, based on at least one continuous variable, for inference utilization in Bayesian networks where local distributions are decision-graphs. These tractable approximations are employed in lieu of exact inferences that are normally NP-hard to solve. By utilizing Jensen's inequality applied to logarithmic distributions composed of a generalized sum including an introduced arbitrary conditional distribution, a means is acquired to resolve a tightly bound likelihood distribution. The means includes application of Mean-Field Theory, approximations of conditional probability distributions, and/or other means that allow for a tractable variational approximation to be achieved.

    摘要翻译: 本发明利用分布的近似来提供基于至少一个连续变量的可容易的变分近似,用于在本地分布是决策图的贝叶斯网络中的推理利用。 采用这些易于理解的近似来代替通常难以解决的难以确定的精确推论。 通过利用应用于由包括引入的任意条件分布的广义和组成的对数分布的Jensen不等式,获取用于解决紧密约束似然分布的手段。 该方法包括平均场理论的应用,条件概率分布的近似,和/或允许实现易处理变分近似的其他方式。