System and method for building decision trees in a database
    1.
    发明申请
    System and method for building decision trees in a database 有权
    在数据库中构建决策树的系统和方法

    公开(公告)号:US20070179966A1

    公开(公告)日:2007-08-02

    申请号:US11344112

    申请日:2006-02-01

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30539

    摘要: Decision trees are efficiently represented in a relational database. A computer-implemented method of representing a decision tree model in relational form comprises providing a directed acyclic graph comprising a plurality of nodes and a plurality of links, each link connecting a plurality of nodes, encoding a tree structure by including in each node a parent-child relationship of the node with other nodes, encoding in each node information relating to a split represented by the node, the split information including a splitting predictor and a split value, and encoding in each node a target histogram.

    摘要翻译: 决策树在关系数据库中有效地表示。 以关系形式表示决策树模型的计算机实现的方法包括提供包括多个节点和多个链接的有向无环图,每个链接连接多个节点,通过在每个节点中包括父节点来编码树结构 - 节点与其他节点的关系,在每个节点中对由节点表示的分裂相关的信息进行编码,分割信息包括分割预测器和分割值,以及在每个节点中对目标直方图进行编码。

    System and method for building decision tree classifiers using bitmap techniques
    2.
    发明申请
    System and method for building decision tree classifiers using bitmap techniques 有权
    使用位图技术构建决策树分类器的系统和方法

    公开(公告)号:US20070192341A1

    公开(公告)日:2007-08-16

    申请号:US11344193

    申请日:2006-02-01

    IPC分类号: G06F7/00

    摘要: A method, system, and computer program product for counting predictor-target pairs for a decision tree model provides the capability to generate count tables that is quicker and more efficient than previous techniques. A method of counting predictor-target pairs for a decision tree model, the decision tree model based on data stored in a database, the data comprising a plurality of rows of data, at least one predictor and at least one target, comprises generating a bitmap for each split node of data stored in a database system by intersecting a parent node bitmap and a bitmap of a predictor that satisfies a condition of the node, intersecting each split node bitmap with each predictor bitmap and with each target bitmap to form intersected bitmaps, and counting bits of each intersected bitmap to generate a count of predictor-target pairs.

    摘要翻译: 用于计算决策树模型的预测器 - 目标对的方法,系统和计算机程序产品提供了生成比先前技术更快更有效的计数表的能力。 一种对决策树模型计算预测器 - 目标对的方法,基于存储在数据库中的数据的决策树模型,包括多行数据的数据,至少一个预测器和至少一个目标,包括生成位图 通过将父节点位图和满足该节点的条件的预测器的位图相交到数据库系统中存储的数据的每个分割节点,将每个分割节点位图与每个预测器位图相交,并与每个目标位图形成相交的位图, 并计数每个相交位图的位以产生预测器 - 目标对的计数。

    System and method for building decision tree classifiers using bitmap techniques
    3.
    发明授权
    System and method for building decision tree classifiers using bitmap techniques 有权
    使用位图技术构建决策树分类器的系统和方法

    公开(公告)号:US07571159B2

    公开(公告)日:2009-08-04

    申请号:US11344193

    申请日:2006-02-01

    IPC分类号: G06F7/00 G06F17/30 G06F17/00

    摘要: A method, system, and computer program product for counting predictor-target pairs for a decision tree model provides the capability to generate count tables that is quicker and more efficient than previous techniques. A method of counting predictor-target pairs for a decision tree model, the decision tree model based on data stored in a database, the data comprising a plurality of rows of data, at least one predictor and at least one target, comprises generating a bitmap for each split node of data stored in a database system by intersecting a parent node bitmap and a bitmap of a predictor that satisfies a condition of the node, intersecting each split node bitmap with each predictor bitmap and with each target bitmap to form intersected bitmaps, and counting bits of each intersected bitmap to generate a count of predictor-target pairs.

    摘要翻译: 用于计算决策树模型的预测器 - 目标对的方法,系统和计算机程序产品提供了生成比先前技术更快更有效的计数表的能力。 一种对决策树模型计算预测器 - 目标对的方法,基于存储在数据库中的数据的决策树模型,包括多行数据的数据,至少一个预测器和至少一个目标,包括生成位图 通过将父节点位图和满足该节点的条件的预测器的位图相交到数据库系统中存储的数据的每个分割节点,将每个分割节点位图与每个预测器位图相交,并与每个目标位图形成相交的位图, 并计数每个相交位图的位以产生预测器 - 目标对的计数。

    System and method for building decision trees in a database
    4.
    发明授权
    System and method for building decision trees in a database 有权
    在数据库中构建决策树的系统和方法

    公开(公告)号:US09135309B2

    公开(公告)日:2015-09-15

    申请号:US13300030

    申请日:2011-11-18

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30539

    摘要: A computer-implemented method of creating a data mining model in a database management system comprises accepting a database language statement at the database management system, the database language statement indicating a dataset and a data mining model to be created from the dataset, and creating, in the database management system, the indicated data mining model using the indicated dataset, wherein creation and application of the data mining model does not require moving data to a separate data mining engine.

    摘要翻译: 在数据库管理系统中创建数据挖掘模型的计算机实现的方法包括在数据库管理系统上接受数据库语言语句,指示要从数据集创建的数据集和数据挖掘模型的数据库语言语句, 在数据库管理系统中,使用所指示的数据集的指示数据挖掘模型,其中数据挖掘模型的创建和应用不需要将数据移动到单独的数据挖掘引擎。

    System and method for building decision trees in a database
    5.
    发明授权
    System and method for building decision trees in a database 有权
    在数据库中构建决策树的系统和方法

    公开(公告)号:US08065326B2

    公开(公告)日:2011-11-22

    申请号:US11344112

    申请日:2006-02-01

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30539

    摘要: Decision trees are efficiently represented in a relational database. A computer-implemented method of representing a decision tree model in relational form comprises providing a directed acyclic graph comprising a plurality of nodes and a plurality of links, each link connecting a plurality of nodes, encoding a tree structure by including in each node a parent-child relationship of the node with other nodes, encoding in each node information relating to a split represented by the node, the split information including a splitting predictor and a split value, and encoding in each node a target histogram.

    摘要翻译: 决策树在关系数据库中有效地表示。 以关系形式表示决策树模型的计算机实现的方法包括提供包括多个节点和多个链接的有向无环图,每个链接连接多个节点,通过在每个节点中包括父节点来编码树结构 - 节点与其他节点的关系,在每个节点中对由节点表示的分裂相关的信息进行编码,分割信息包括分割预测器和分割值,以及在每个节点中对目标直方图进行编码。

    Dynamic selection of frequent itemset counting technique
    6.
    发明授权
    Dynamic selection of frequent itemset counting technique 有权
    频繁项集计数技术的动态选择

    公开(公告)号:US07720790B2

    公开(公告)日:2010-05-18

    申请号:US10643563

    申请日:2003-08-18

    IPC分类号: G06F17/30 G06F7/00

    摘要: Techniques are provided for (1) extending SQL to support direct invocation of frequent itemset operations, (2) improving the performance of frequent itemset operations by clustering itemset combinations to more efficiently use previously produced results, and (3) making on-the-fly selection of the occurrence counting technique to use during each phase of a multiple phase frequent itemset operation. When directly invoked in an SQL statement, a frequent itemset operation may receive input from results of operations specified in the SQL statement, and provide its results directly to other operations specified in the SQL statement. By clustering itemset combinations, resources may be used more efficiently by retaining intermediate information as long as it is useful, and then discarding it to free up volatile memory. Dynamically selecting an occurrence counting technique allows a single frequent itemset operation to change the occurrence counting technique that it is using midstream, based on cost considerations and/or environmental conditions.

    摘要翻译: 提供技术用于(1)扩展SQL以支持频繁项目集操作的直接调用,(2)通过聚类项目组合来提高频繁项目集操作的性能,以更有效地使用先前生成的结果,以及(3) 选择在多相频繁项目集操作的每个阶段期间使用的发生计数技术。 当在SQL语句中直接调用时,频繁的项目集操作可以从SQL语句中指定的操作结果接收输入,并将其结果直接提供给SQL语句中指定的其他操作。 通过对项目集合进行聚类,可以通过保留中间信息来更有效地使用资源,只要它是有用的,然后丢弃它来释放易失性存储器。 动态选择发生计数技术允许单个频繁项目集操作基于成本考虑和/或环境条件来改变它正在中游使用的发生计数技术。

    Dynamic selection of frequent itemset counting technique
    7.
    发明申请
    Dynamic selection of frequent itemset counting technique 有权
    频繁项集计数技术的动态选择

    公开(公告)号:US20050044087A1

    公开(公告)日:2005-02-24

    申请号:US10643563

    申请日:2003-08-18

    IPC分类号: G06F7/00 G06F17/30

    摘要: Techniques are provided for (1) extending SQL to support direct invocation of frequent itemset operations, (2) improving the performance of frequent itemset operations by clustering itemset combinations to more efficiently use previously produced results, and (3) making on-the-fly selection of the occurrence counting technique to use during each phase of a multiple phase frequent itemset operation. When directly invoked in an SQL statement, a frequent itemset operation may receive input from results of operations specified in the SQL statement, and provide its results directly to other operations specified in the SQL statement. By clustering itemset combinations, resources may be used more efficiently by retaining intermediate information as long as it is useful, and then discarding it to free up volatile memory. Dynamically selecting an occurrence counting technique allows a single frequent itemset operation to change the occurrence counting technique that it is using midstream, based on cost considerations and/or environmental conditions.

    摘要翻译: 提供技术用于(1)扩展SQL以支持频繁项目集操作的直接调用,(2)通过聚类项目组合来提高频繁项目集操作的性能,以更有效地使用先前生成的结果,以及(3) 选择在多相频繁项目集操作的每个阶段期间使用的发生计数技术。 当在SQL语句中直接调用时,频繁的项目集操作可以从SQL语句中指定的操作结果接收输入,并将其结果直接提供给SQL语句中指定的其他操作。 通过对项目集合进行聚类,可以通过保留中间信息来更有效地使用资源,只要它是有用的,然后丢弃它来释放易失性存储器。 动态选择发生计数技术允许单个频繁项目集操作基于成本考虑和/或环境条件来改变它正在中游使用的发生计数技术。

    System and method for sequence matching and alignment in a relational database management system
    8.
    发明申请
    System and method for sequence matching and alignment in a relational database management system 审中-公开
    关系数据库管理系统中序列匹配和对齐的系统和方法

    公开(公告)号:US20050050033A1

    公开(公告)日:2005-03-03

    申请号:US10916462

    申请日:2004-08-12

    IPC分类号: G06F17/30 G06F19/00 G06F7/00

    摘要: An integrated solution in which BLAST functionality is integrated into a DBMS provides improved performance and scalability over the conventional approach, in addition to reducing the required hardware resources and reducing the cost of the system. In a database management system, a system for sequence matching and alignment comprises a database table storing sequence information comprising target sequences, a query sequence, a table function operable to accept the query sequence and match the query sequence with at least one target sequence stored in the database table, and a structured query language query referencing a database table storing sequence information comprising target sequences, a query sequence, and a table function, the structured query language query evaluatable by the database management system.

    摘要翻译: 将BLAST功能集成到DBMS中的集成解决方案除了减少所需的硬件资源并降低系统成本外,还提供了比传统方法更好的性能和可扩展性。 在数据库管理系统中,用于序列匹配和对齐的系统包括存储包括目标序列的序列信息的数据库表,查询序列,可用于接受查询序列并将查询序列与存储在该查询序列中的至少一个目标序列匹配的表函数 数据库表,以及引用存储包括目标序列,查询序列和表函数的序列信息的数据库表的结构化查询语言查询,可由数据库管理系统评估的结构化查询语言查询。

    Integrated database and data-mining system
    9.
    发明授权
    Integrated database and data-mining system 失效
    综合数据库和数据挖掘系统

    公开(公告)号:US06324533B1

    公开(公告)日:2001-11-27

    申请号:US09087561

    申请日:1998-05-29

    IPC分类号: G06F1730

    摘要: A method and apparatus for mining data relationships from an integrated database and data-mining system are disclosed. A set of frequent 1-itemsets is generated using a group-by query on data transactions. From these frequent 1-itemsets and the transactions, frequent 2-itemsets are determined. A candidate set of (n+2)-itemsets are generated from the frequent 2-itemsets, where n=1. Frequent (n+2)-itemsets are determined from candidate set and the transaction table using a query operation. The candidate set and frequent (n+2)-itemset are generated for (n+1) until the candidate set is empty. Rules are then extracted from the union of the determined frequent itemsets.

    摘要翻译: 公开了一种用于从综合数据库和数据挖掘系统挖掘数据关系的方法和装置。 使用对数据事务的分组查询生成一组频繁的1项集。 从这些频繁的1项目集和事务中,确定频繁的2项集。 从n = 1的频繁2项集中生成(n + 2)个候选集的候选集。 使用查询操作从候选集和事务表确定频繁(n + 2)个事件。 为(n + 1)生成候选集和频繁(n + 2)个目录,直到候选集为空。 然后从确定的频繁项集的并集中提取规则。

    SQL-based Naïve Bayes model building and scoring
    10.
    发明授权
    SQL-based Naïve Bayes model building and scoring 有权
    基于SQL的NaïveBayes模型构建和评分

    公开(公告)号:US07051037B1

    公开(公告)日:2006-05-23

    申请号:US10156060

    申请日:2002-05-29

    IPC分类号: G06F17/30

    摘要: The present invention provides an efficient method and system of data mining using SQL queries for model building and scoring. The invention provides a database management system having a database containing data, a database engine operatively connected to process the data, a SQL server operatively connected to the database and a data mining tool, whereby the data mining tool is based on a Naïve Bayes model. The SQL server uses the data and the Naïve Bayes model to develop the data mining tool. The data mining tool is located is located in the database management system. The data mining tool has a model building system based on at least one SQL query and training data, and a scoring system based on SQL queries.

    摘要翻译: 本发明提供了使用SQL查询进行建模和评分的数据挖掘的有效方法和系统。 本发明提供了一种具有包含数据的数据库的数据库管理系统,可操作地连接以处理数据的数据库引擎,可操作地连接到数据库的SQL服务器和数据挖掘工具,由此数据挖掘工具基于朴素贝叶斯模型。 SQL Server使用数据和NaïveBayes模型开发数据挖掘工具。 数据挖掘工具位于数据库管理系统中。 数据挖掘工具具有基于至少一个SQL查询和训练数据的模型构建系统,以及基于SQL查询的评分系统。