Progress notification supporting data mining
    2.
    发明授权
    Progress notification supporting data mining 有权
    进度通知支持数据挖掘

    公开(公告)号:US07031978B1

    公开(公告)日:2006-04-18

    申请号:US10146821

    申请日:2002-05-17

    IPC分类号: G06F17/30

    摘要: The present invention relates to progress notification systems, computer program products and methods of operation thereof, that reports processing progress of data mining operations at regular periodic intervals. The system comprises: an input/output interface for exchanging information with a network; a memory for storing updated progress objects associated with the data mining operation as a set of data mining algorithms progress in processing; and a processor coupled to the input/output interface and the memory, the processor for performing the data mining operation, the data mining operation implementing the set of data mining algorithms; and generating a notification object for the data mining operation at a pre-determined interval, the notification object based on the progress objects at each of the pre-determined intervals.

    摘要翻译: 本发明涉及进度通知系统,计算机程序产品及其操作方法,其以定期周期的间隔报告数据挖掘操​​作的进程。 该系统包括:用于与网络交换信息的输入/输出接口; 用于存储与数据挖掘操​​作相关联的更新的进度对象作为一组数据挖掘算法在进程中进行的存储器; 以及耦合到输入/输出接口和存储器的处理器,用于执行数据挖掘操​​作的处理器,实现数据挖掘算法集合的数据挖掘操​​作; 以及以预定间隔生成用于所述数据挖掘操​​作的通知对象,所述通知对象基于每个预定间隔处的进度对象。

    Data mining model building using attribute importance
    4.
    发明授权
    Data mining model building using attribute importance 有权
    数据挖掘模型建立使用属性重要性

    公开(公告)号:US07219099B2

    公开(公告)日:2007-05-15

    申请号:US10409082

    申请日:2003-04-09

    IPC分类号: G06F7/00

    摘要: A system, method, and computer program product that uses attribute importance (AI) to reduce the time and computation resources required to build data mining models, and which provides a corresponding reduction in the cost of data mining. Attribute importance (AI) involves a process of choosing a subset of the original predictive attributes by eliminating redundant, irrelevant or uninformative ones and identifying those predictor attributes that may be most helpful in making predictions. A new algorithm Predictor Variance is proposed and a method of selecting predictive attributes for a data mining model comprises the steps of receiving a dataset having a plurality of predictor attributes, for each predictor attribute, determining a predictive quality of the predictor attribute, selecting at least one predictor attribute based on the determined predictive quality of the predictor attribute, and building a data mining model including only the selected at least one predictor attribute.

    摘要翻译: 一种使用属性重要性(AI)来减少构建数据挖掘模型所需的时间和计算资源的系统,方法和计算机程序产品,并提供数据挖掘成本的相应降低。 属性重要性(AI)涉及通过消除冗余的,不相关的或不具有意义的属性来选择原始预测属性的子集的过程,并且识别可能对预测最有帮助的那些预测因子属性。 提出了一种新的算法预测器方差,并且为数据挖掘模型选择预测属性的方法包括以下步骤:为每个预测器属性接收具有多个预测器属性的数据集,确定预测器属性的预测质量,至少选择 基于所确定的预测器属性的预测质量的一个预测器属性,以及构建仅包括所选择的至少一个预测器属性的数据挖掘模型。

    Cross-validation for naive bayes data mining model
    5.
    发明授权
    Cross-validation for naive bayes data mining model 有权
    天真贝叶斯数据挖掘模型的交叉验证

    公开(公告)号:US07299215B2

    公开(公告)日:2007-11-20

    申请号:US10419761

    申请日:2003-04-22

    IPC分类号: G06F17/00 G06N5/00

    CPC分类号: G06K9/6296 G06K9/6256

    摘要: A system, method, and computer program product provides a useful measure of the accuracy of a Naïve Bayes predictive model and reduced computational expense relative to conventional techniques. A method for measuring accuracy of a Naïve Bayes predictive model comprises the steps of receiving a training dataset comprising a plurality of rows of data, building a Naïve Bayes predictive model using the training dataset, for each of at least a portion of the plurality of rows of data in the training dataset incrementally untraining the Naïve Bayes predictive model using the row of data and determining an accuracy of the incrementally untrained Naïve Bayes predictive model, and determining an aggregate accuracy of the Naïve Bayes predictive model.

    摘要翻译: 系统,方法和计算机程序产品提供了相对于常规技术的初始贝叶斯预测模型的准确性的有用的测量和减少的计算费用。 一种用于测量朴素贝叶斯预测模型的精度的方法包括以下步骤:接收包括多行数据的训练数据集,使用训练数据集构建用于多行的至少一部分中的每一行的初始贝叶斯预测模型 的训练数据集中的数据逐渐地使用数据行排除初始贝叶斯预测模型,并确定增量未经训练的朴素贝叶斯预测模型的准确性,并确定朴素贝叶斯预测模型的聚合精度。

    Checkpoint model building for data mining
    6.
    发明授权
    Checkpoint model building for data mining 有权
    检查点模型构建用于数据挖掘

    公开(公告)号:US07117391B1

    公开(公告)日:2006-10-03

    申请号:US10284088

    申请日:2002-10-31

    IPC分类号: G06F11/00

    CPC分类号: G06F11/1438

    摘要: The notion of checkpointing model building is introduced to safeguard against the loss of computational results due to abnormal termination of long running computation algorithms. A checkpoint manager initiates a checkpoint, wherein initiation occurs by various scenarios, including dynamically, automatically and manually. A computation module executes the checkpoint on an in progress data computation. The checkpoint manager also monitors the execution of the checkpoint for abnormal termination. Upon a determination of abnormal checkpoint termination, the inability to resume model building from a checkpoint, the checkpoint manger either resumes model build from the checkpoint or aborts the model build.

    摘要翻译: 引入了检查点模型构建的概念,以防止由于长时间运行的计算算法的异常终止导致的计算结果的损失。 检查点管理员启动一个检查点,其中启动发生在各种情况下,包括动态地,自动地和手动地发生。 计算模块在进行中的数据计算中执行检查点。 检查点管理器还监视检查点的执行异常终止。 在确定异常检查点终止后,无法从检查点恢复模型构建,检查点管理器将从检查点恢复模型构建或中止模型构建。