Systems and methods for new time series model probabilistic ARMA
    3.
    发明授权
    Systems and methods for new time series model probabilistic ARMA 有权
    新时间序列模型概率ARMA的系统和方法

    公开(公告)号:US07580813B2

    公开(公告)日:2009-08-25

    申请号:US10463145

    申请日:2003-06-17

    IPC分类号: G06F17/50 G05B23/02

    CPC分类号: G06F17/18

    摘要: The present invention utilizes a cross-prediction scheme to predict values of discrete and continuous time observation data, wherein conditional variance of each continuous time tube variable is fixed to a small positive value. By allowing cross-predictions in an ARMA based model, values of continuous and discrete observations in a time series are accurately predicted. The present invention accomplishes this by extending an ARMA model such that a first time series “tube” is utilized to facilitate or “cross-predict” values in a second time series tube to form an “ARMAxp” model. In general, in the ARMAxp model, the distribution of each continuous variable is a decision graph having splits only on discrete variables and having linear regressions with continuous regressors at all leaves, and the distribution of each discrete variable is a decision graph having splits only on discrete variables and having additional distributions at all leaves.

    摘要翻译: 本发明利用交叉预测方案来预测离散和连续时间观测数据的值,其中每个连续时间管变量的条件方差固定为小的正值。 通过在基于ARMA的模型中允许交叉预测,可以准确预测时间序列中连续和离散观测值。 本发明通过扩展ARMA模型来实现这一目的,使得第一时间序列“管”用于促进或“交叉预测”第二时间序列管中的值以形成“ARMAxp”模型。 一般来说,在ARMAxp模型中,每个连续变量的分布是仅在离散变量上分裂并具有在所有叶上具有连续回归的线性回归的决策图,并且每个离散变量的分布是仅分解为 离散变量,并在所有叶子上具有额外的分布。

    Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications
    4.
    发明授权
    Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications 有权
    用于可视化数据集群和分级集群分类的装置和相关方法

    公开(公告)号:US06742003B2

    公开(公告)日:2004-05-25

    申请号:US09845151

    申请日:2001-04-30

    IPC分类号: G06F1730

    摘要: A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed. The system permits a user to browse through the hierarchy, and, to readily comprehend segment inter-relationships, selectively expand and contract the displayed hierarchy, as desired, as well as to compare two selected segments or segment groups together and graphically display the results of that comparison. An alternative discriminant-based cluster scoring technique is also presented.

    摘要翻译: 一个包含交互式图形用户界面的系统,用于可视化数据的集群(类别)和分段(聚合集群)。 具体来说,系统将传入的病例数据自动分类为群集,将这些群集合成段,确定段的相似性度量,通过相似性度量对所选段进行分类,然后形成并可视地描绘这些群集的层次结构。 基于片段或段组的相似性度量,系统还可以根据需要自动和动态地减少层次组织的深度,通过消除不必要的层级和节点间链接。 倾向于对每个段进行有意义表征的属性/值数据也被划分,基于归一化分数进行排序,然后以图形方式显示。 该系统允许用户浏览层次结构,并且为了容易地理解分段相互关系,根据需要选择性地扩展和收缩所显示的层次结构,以及将两个选定的分段或分段组进行比较,并以图形方式显示 那个比较。 还提出了一种替代的基于判别式的聚类评分技术。

    Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications
    5.
    发明授权
    Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications 有权
    用于可视化数据集群和分级集群分类的装置和相关方法

    公开(公告)号:US07333998B2

    公开(公告)日:2008-02-19

    申请号:US10808064

    申请日:2004-03-24

    IPC分类号: G06F17/30

    摘要: A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed. The system permits a user to browse through the hierarchy, and, to readily comprehend segment inter-relationships, selectively expand and contract the displayed hierarchy, as desired, as well as to compare two selected segments or segment groups together and graphically display the results of that comparison. An alternative discriminant-based cluster scoring technique is also presented.

    摘要翻译: 一个包含交互式图形用户界面的系统,用于可视化数据的集群(类别)和分段(聚合集群)。 具体来说,系统将传入的案例数据自动分类为群集,将这些群集归纳为段,确定段的相似性度量,通过相似性度量对所选段进行分类,然后形成并可视地描绘这些群集的层次结构。 基于片段或段组的相似性度量,系统还可以根据需要自动和动态地减少层次组织的深度,通过消除不必要的层级和节点间链接。 倾向于对每个段进行有意义表征的属性/值数据也被划分,基于归一化分数进行排序,然后以图形方式显示。 该系统允许用户浏览层次结构,并且为了容易地理解分段相互关系,根据需要选择性地扩展和收缩所显示的分层结构,并且将两个选定的分段或分段组进行比较,并以图形方式显示 比较。 还提出了一种替代的基于判别式的聚类评分技术。

    Trees of classifiers for detecting email spam
    6.
    发明授权
    Trees of classifiers for detecting email spam 有权
    用于检测电子邮件垃圾邮件的分类树

    公开(公告)号:US07930353B2

    公开(公告)日:2011-04-19

    申请号:US11193691

    申请日:2005-07-29

    IPC分类号: G06F15/16

    CPC分类号: H04L51/12

    摘要: Decision trees populated with classifier models are leveraged to provide enhanced spam detection utilizing separate email classifiers for each feature of an email. This provides a higher probability of spam detection through tailoring of each classifier model to facilitate in more accurately determining spam on a feature-by-feature basis. Classifiers can be constructed based on linear models such as, for example, logistic-regression models and/or support vector machines (SVM) and the like. The classifiers can also be constructed based on decision trees. “Compound features” based on internal and/or external nodes of a decision tree can be utilized to provide linear classifier models as well. Smoothing of the spam detection results can be achieved by utilizing classifier models from other nodes within the decision tree if training data is sparse. This forms a base model for branches of a decision tree that may not have received substantial training data.

    摘要翻译: 利用分类器模型填充的决策树利用电子邮件的每个功能使用单独的电子邮件分类器来提供增强的垃圾邮件检测。 这通过定制每个分类器模型提供了更高的垃圾邮件检测的概率,以便于在逐个特征的基础上更准确地确定垃圾邮件。 分类器可以基于诸如逻辑回归模型和/或支持向量机(SVM)等线性模型来构建。 分类器也可以基于决策树构建。 基于决策树的内部和/或外部节点的“复合特征”也可以用于提供线性分类器模型。 垃圾邮件检测结果的平滑可以通过使用来自决策树内的其他节点的分类器模型来实现,如果训练数据是稀疏的。 这形成了可能没有接收到大量训练数据的决策树的分支的基本模型。

    Systems and methods for tractable variational approximation for interference in decision-graph Bayesian networks
    7.
    发明授权
    Systems and methods for tractable variational approximation for interference in decision-graph Bayesian networks 失效
    用于决策贝叶斯网络干扰的易变性近似的系统和方法

    公开(公告)号:US07184993B2

    公开(公告)日:2007-02-27

    申请号:US10458166

    申请日:2003-06-10

    IPC分类号: G06F17/00 G06N5/02

    CPC分类号: G06N7/005 G06N5/04

    摘要: The present invention leverages approximations of distributions to provide tractable variational approximations, based on at least one continuous variable, for inference utilization in Bayesian networks where local distributions are decision-graphs. These tractable approximations are employed in lieu of exact inferences that are normally NP-hard to solve. By utilizing Jensen's inequality applied to logarithmic distributions composed of a generalized sum including an introduced arbitrary conditional distribution, a means is acquired to resolve a tightly bound likelihood distribution. The means includes application of Mean-Field Theory, approximations of conditional probability distributions, and/or other means that allow for a tractable variational approximation to be achieved.

    摘要翻译: 本发明利用分布的近似来提供基于至少一个连续变量的可容易的变分近似,用于在本地分布是决策图的贝叶斯网络中的推理利用。 采用这些易于理解的近似来代替通常难以解决的难以确定的精确推论。 通过利用应用于由包括引入的任意条件分布的广义和组成的对数分布的Jensen不等式,获取用于解决紧密约束似然分布的手段。 该方法包括平均场理论的应用,条件概率分布的近似,和/或允许实现易处理变分近似的其他方式。

    Automatic data perspective generation for a target variable
    8.
    发明授权
    Automatic data perspective generation for a target variable 有权
    为目标变量生成自动数据透视图

    公开(公告)号:US07225200B2

    公开(公告)日:2007-05-29

    申请号:US10824108

    申请日:2004-04-14

    IPC分类号: G06F17/00 G06F7/00

    摘要: The present invention leverages machine learning techniques to provide automatic generation of conditioning variables for constructing a data perspective for a given target variable. The present invention determines and analyzes the best target variable predictors for a given target variable, employing them to facilitate the conveying of information about the target variable to a user. It automatically discretizes continuous and discrete variables utilized as target variable predictors to establish their granularity. In other instances of the present invention, a complexity and/or utility parameter can be specified to facilitate generation of the data perspective via analyzing a best target variable predictor versus the complexity of the conditioning variable(s) and/or utility. The present invention can also adjust the conditioning variables (i.e., target variable predictors) of the data perspective to provide an optimum view and/or accept control inputs from a user to guide/control the generation of the data perspective.

    摘要翻译: 本发明利用机器学习技术来提供用于为给定目标变量构建数据透视图的自动生成调节变量。 本发明确定和分析给定目标变量的最佳目标变量预测变量,使用它们来促进向用户传达关于目标变量的信息。 它自动离散化用作目标变量预测变量的连续和离散变量以确定其粒度。 在本发明的其他实例中,可以规定复杂性和/或效用参数,以通过分析最佳目标变量预测器与调节变量和/或效用的复杂性来促进数据透视的产生。 本发明还可以调整数据透视图的调节变量(即,目标变量预测器),以提供最佳视图和/或接受来自用户的控制输入以指导/控制数据视角的产生。

    Dependency network based model (or pattern)
    9.
    发明授权
    Dependency network based model (or pattern) 有权
    基于依赖网络的模型(或模式)

    公开(公告)号:US08140569B2

    公开(公告)日:2012-03-20

    申请号:US10447462

    申请日:2003-05-29

    IPC分类号: G06F17/30 G06F7/00

    摘要: A dependency network is created from a training data set utilizing a scalable method. A statistical model (or pattern), such as for example a Bayesian network, is then constructed to allow more convenient inferencing. The model (or pattern) is employed in lieu of the training data set for data access. The computational complexity of the method that produces the model (or pattern) is independent of the size of the original data set. The dependency network directly returns explicitly encoded data in the conditional probability distributions of the dependency network. Non-explicitly encoded data is generated via Gibbs sampling, approximated, or ignored.

    摘要翻译: 从使用可伸缩方法的训练数据集创建依赖网络。 然后构建统计模型(或模式),例如贝叶斯网络,以允许更方便的推论。 采用模型(或模式)代替用于数据访问的训练数据集。 产生模型(或模式)的方法的计算复杂度与原始数据集的大小无关。 依赖网络直接在依赖网络的条件概率分布中返回显式编码的数据。 通过Gibbs采样,近似或忽略来生成非显式编码数据。

    Generating improved belief networks
    10.
    发明授权
    Generating improved belief networks 失效
    产生改进的信念网络

    公开(公告)号:US06529888B1

    公开(公告)日:2003-03-04

    申请号:US08739200

    申请日:1996-10-30

    IPC分类号: G06N504

    CPC分类号: G06N5/022

    摘要: An improved belief network generator is provided. A belief network is generated utilizing expert knowledge retrieved from an expert in a given field of expertise and empirical data reflecting observations made in the given field of the expert. In addition to utilizing expert knowledge and empirical data, the belief network generator provides for the use of continuous variables in the generated belief network and missing data in the empirical data.

    摘要翻译: 提供了一种改进的信任网络生成器。 使用从专家领域的专家知识获取的专家知识和反映在专家的给定领域中作出的观察的经验数据产生信念网络。 除了利用专家知识和经验数据外,信念网络生成器还提供了在生成的信念网络中使用连续变量,并在经验数据中提供丢失的数据。