DISCOVERING INTERESTINGNESS IN FACETED SEARCH
    1.
    发明申请
    DISCOVERING INTERESTINGNESS IN FACETED SEARCH 审中-公开
    发现面对搜索的兴趣

    公开(公告)号:US20090106244A1

    公开(公告)日:2009-04-23

    申请号:US12200981

    申请日:2008-08-29

    IPC分类号: G06F17/30

    摘要: Exemplary embodiments of the present invention relate to enhanced faceted search support for OLAP queries over unstructured text as well as structured dimensions by the dynamic and automatic discovery of dimensions that are determined to be most “interesting” to a user based upon the data. Within the exemplary embodiments “interestingness” is defined as how surprising a summary along some dimensions is from a user's expectation. Further, multi-attribute facets are determined and a user is optionally permitted to specify the distribution of values that she expects, and/or the distance metric by which actual and expected distributions are to be compared.

    摘要翻译: 本发明的示例性实施例涉及通过基于数据被确定为对用户最“有趣”的维度的动态和自动发现的针对非结构化文本的OLAP查询的增强的分面搜索支持以及结构化维度。 在示例性实施例中,“兴趣”被定义为从用户的期望来看,某些维度上的摘要令人惊讶。 此外,确定多属性小平面,并且可选地允许用户指定她期望的值的分布,和/或要比较实际和预期分布的距离度量。

    Discovering interestingness in faceted search
    2.
    发明授权
    Discovering interestingness in faceted search 有权
    发现有趣的搜索

    公开(公告)号:US07392250B1

    公开(公告)日:2008-06-24

    申请号:US11876042

    申请日:2007-10-22

    IPC分类号: G06F17/30

    摘要: Exemplary embodiments of the present invention relate to enhanced faceted search support for OLAP queries over unstructured text as well as structured dimensions by the dynamic and automatic discovery of dimensions that are determined to be most “interesting” to a user based upon the data. Within the exemplary embodiments “interestingness” is defined as how surprising a summary along some dimensions is from a user's expectation. Further, multi-attribute facets are determined and a user is optionally permitted to specify the distribution of values that she expects, and/or the distance metric by which actual and expected distributions are to be compared.

    摘要翻译: 本发明的示例性实施例涉及通过基于数据被确定为对用户最“有趣”的维度的动态和自动发现的针对非结构化文本的OLAP查询的增强的分面搜索支持以及结构化维度。 在示例性实施例中,“兴趣”被定义为从用户的期望来看,某些维度上的摘要令人惊讶。 此外,确定多属性小平面,并且可选地允许用户指定她期望的值的分布,和/或要比较实际和预期分布的距离度量。

    Computer automated discovery of interestingness in faceted search
    3.
    发明授权
    Computer automated discovery of interestingness in faceted search 有权
    计算机自动发现有趣的搜索

    公开(公告)号:US07493319B1

    公开(公告)日:2009-02-17

    申请号:US12118206

    申请日:2008-05-09

    IPC分类号: G06F17/30

    摘要: Exemplary embodiments of the present invention relate to enhanced faceted search support for OLAP queries over unstructured text as well as structured dimensions by the dynamic and automatic discovery of dimensions that are determined to be most “interesting” to a user based upon the data. Within the exemplary embodiments “interestingness” is defined as how surprising a summary along some dimensions is from a user's expectation. Further, multi-attribute facets are determined and a user is optionally permitted to specify the distribution of values that she expects, and/or the distance metric by which actual and expected distributions are to be compared.

    摘要翻译: 本发明的示例性实施例涉及通过基于数据被确定为对用户最“有趣”的维度的动态和自动发现的针对非结构化文本的OLAP查询的增强的分面搜索支持以及结构化维度。 在示例性实施例中,“兴趣”被定义为从用户的期望来看,某些维度上的摘要令人惊讶。 此外,确定多属性面,并且可选地允许用户指定她期望的值的分布,和/或要比较实际和预期分布的距离度量。

    System and method for automating data partitioning in a parallel database
    4.
    发明授权
    System and method for automating data partitioning in a parallel database 有权
    用于在并行数据库中自动化数据分区的系统和方法

    公开(公告)号:US07562090B2

    公开(公告)日:2009-07-14

    申请号:US10324362

    申请日:2002-12-19

    IPC分类号: G06F7/00 G06F12/00

    摘要: A system for automating data partitioning in a parallel database includes plural nodes connected in parallel. Each node includes a database server and two databases connected thereto. Each database server includes a query optimizer. Moreover, a partitioning advisor communicates with the database server and the query optimizer. The query optimizer and the partitioning advisor include a program for recommending and evaluating data table partitions that are useful for processing a workload of query statements. The data table partitions are recommended and evaluated without requiring the data tables to be physically repartitioned.

    摘要翻译: 用于在并行数据库中自动化数据划分的系统包括并行连接的多个节点。 每个节点包括一个数据库服务器和两个连接到其上的数据库。 每个数据库服务器都包含一个查询优化器。 此外,分区顾问与数据库服务器和查询优化器进行通信。 查询优化器和分区顾问程序包括一个用于推荐和评估对处理查询语句的工作负载有用的数据表分区的程序。 建议和评估数据表分区,而不需要物理重新分区数据表。

    AUTOMATICALLY AND ADAPTIVELY DETERMINING EXECUTION PLANS FOR QUERIES WITH PARAMETER MARKERS
    5.
    发明申请
    AUTOMATICALLY AND ADAPTIVELY DETERMINING EXECUTION PLANS FOR QUERIES WITH PARAMETER MARKERS 失效
    自动和自适应地确定具有参数标记的查询的执行计划

    公开(公告)号:US20080222093A1

    公开(公告)日:2008-09-11

    申请号:US12125221

    申请日:2008-05-22

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30469

    摘要: A method and system for automatically and adaptively determining query execution plans for parametric queries. A first classifier trained by an initial set of training points is generated. A query workload and/or database statistics are dynamically updated. A new set of training points is collected off-line. Using the new set of training points, the first classifier is modified into a second classifier. A database query is received at a runtime subsequent to the off-line phase. The query includes predicates having parameter markers bound to actual values. The predicates are associated with selectivities. A mapping of the selectivities into a plan determines the query execution plan. The determined query execution plan is included in an augmented set of training points, where the augmented set includes the initial set and the new set.

    摘要翻译: 一种用于自动和自适应地确定参数查询的查询执行计划的方法和系统。 产生由初始训练点训练的第一分类器。 动态更新查询工作负载和/或数据库统计信息。 离线收集了一套新的培训点。 使用新的一组训练点,第一个分类器被修改为第二个分类器。 在离线阶段之后的运行时间接收数据库查询。 该查询包括具有绑定到实际值的参数标记的谓词。 谓词与选择性相关联。 将选择性映射到计划中确定查询执行计划。 确定的查询执行计划被包括在增强的训练点集合中,其中增强集合包括初始集合和新集合。

    SYSTEM AND METHOD FOR AUTOMATING DATA PARTITIONING IN A PARALLEL DATABASE
    6.
    发明申请
    SYSTEM AND METHOD FOR AUTOMATING DATA PARTITIONING IN A PARALLEL DATABASE 失效
    用于在并行数据库中自动数据分区的系统和方法

    公开(公告)号:US20080263001A1

    公开(公告)日:2008-10-23

    申请号:US12110674

    申请日:2008-04-28

    IPC分类号: G06F17/30

    摘要: A system for automating data partitioning in a parallel database includes plural nodes connected in parallel. Each node includes a database server and two databases connected thereto. Each database server includes a query optimizer. Moreover, a partitioning advisor communicates with the database server and the query optimizer. The query optimizer and the partitioning advisor include a program for recommending and evaluating data table partitions that are useful for processing a workload of query statements. The data table partitions are recommended and evaluated without requiring the data tables to be physically repartitioned.

    摘要翻译: 用于在并行数据库中自动化数据划分的系统包括并行连接的多个节点。 每个节点包括一个数据库服务器和两个连接到其上的数据库。 每个数据库服务器都包含一个查询优化器。 此外,分区顾问与数据库服务器和查询优化器进行通信。 查询优化器和分区顾问程序包括一个用于推荐和评估对处理查询语句的工作负载有用的数据表分区的程序。 建议和评估数据表分区,而不需要物理重新分区数据表。

    AUTOMATICALLY AND ADAPTIVELY DETERMINING EXECUTION PLANS FOR QUERIES WITH PARAMETER MARKERS
    7.
    发明申请
    AUTOMATICALLY AND ADAPTIVELY DETERMINING EXECUTION PLANS FOR QUERIES WITH PARAMETER MARKERS 审中-公开
    自动和自适应地确定具有参数标记的查询的执行计划

    公开(公告)号:US20080195577A1

    公开(公告)日:2008-08-14

    申请号:US11673091

    申请日:2007-02-09

    IPC分类号: G06F17/30

    CPC分类号: G06F16/24545

    摘要: A method for automatically and adaptively determining query execution plans for parametric queries. A first classifier trained by an initial set of training points is generated using a set of random decision trees (RDTs). A query workload and/or database statistics are dynamically updated. A new set of training points collected off-line is used to modify the first classifier into a second classifier. A database query is received at a runtime subsequent to the off line phase. The query includes predicates having parameter markers bound to actual values. The predicates are associated with selectivities. The query execution plan is determined by identifying an optimal average of posterior probabilities obtained across a set of RDTs and mapping the selectivities to a plan. The determined query execution plan is included in an augmented set of training points that includes the initial set and the new set.

    摘要翻译: 一种用于自动和自适应地确定参数查询的查询执行计划的方法。 使用一组随机决策树(RDT)生成由初始训练点组训练的第一分类器。 动态更新查询工作负载和/或数据库统计信息。 离线收集的一组新的训练点用于将第一个分类器修改为第二个分类器。 在离线阶段之后的运行时间接收数据库查询。 该查询包括具有绑定到实际值的参数标记的谓词。 谓词与选择性相关联。 查询执行计划通过确定通过一组RDT获得的后验概率的最优平均值并将选择性映射到计划来确定。 确定的查询执行计划被包括在包括初始集合和新集合的增强的训练点集合中。

    System and method for automating data partitioning in a parallel database
    8.
    发明授权
    System and method for automating data partitioning in a parallel database 失效
    用于在并行数据库中自动化数据分区的系统和方法

    公开(公告)号:US08001109B2

    公开(公告)日:2011-08-16

    申请号:US12110674

    申请日:2008-04-28

    IPC分类号: G06F7/00

    摘要: A system for automating data partitioning in a parallel database includes plural nodes connected in parallel. Each node includes a database server and two databases connected thereto. Each database server includes a query optimizer. Moreover, a partitioning advisor communicates with the database server and the query optimizer. The query optimizer and the partitioning advisor include a program for recommending and evaluating data table partitions that are useful for processing a workload of query statements. The data table partitions are recommended and evaluated without requiring the data tables to be physically repartitioned.

    摘要翻译: 用于在并行数据库中自动化数据划分的系统包括并行连接的多个节点。 每个节点包括一个数据库服务器和两个连接到其上的数据库。 每个数据库服务器都包含一个查询优化器。 此外,分区顾问与数据库服务器和查询优化器进行通信。 查询优化器和分区顾问程序包括一个用于推荐和评估对处理查询语句的工作负载有用的数据表分区的程序。 建议和评估数据表分区,而不需要物理重新分区数据表。

    Automatically and adaptively determining execution plans for queries with parameter markers
    9.
    发明授权
    Automatically and adaptively determining execution plans for queries with parameter markers 失效
    自动和自适应地确定具有参数标记的查询的执行计划

    公开(公告)号:US07958113B2

    公开(公告)日:2011-06-07

    申请号:US12125221

    申请日:2008-05-22

    IPC分类号: G06F7/00 G06F17/30 G06F15/16

    CPC分类号: G06F17/30469

    摘要: A method and system for automatically and adaptively determining query execution plans for parametric queries. A first classifier trained by an initial set of training points is generated. A query workload and/or database statistics are dynamically updated. A new set of training points is collected off-line. Using the new set of training points, the first classifier is modified into a second classifier. A database query is received at a runtime subsequent to the off-line phase. The query includes predicates having parameter markers bound to actual values. The predicates are associated with selectivities. A mapping of the selectivities into a plan determines the query execution plan. The determined query execution plan is included in an augmented set of training points, where the augmented set includes the initial set and the new set.

    摘要翻译: 一种用于自动和自适应地确定参数查询的查询执行计划的方法和系统。 产生由初始训练点训练的第一分类器。 动态更新查询工作负载和/或数据库统计信息。 离线收集了一套新的培训点。 使用新的一组训练点,第一个分类器被修改为第二个分类器。 在离线阶段之后的运行时间接收数据库查询。 该查询包括具有绑定到实际值的参数标记的谓词。 谓词与选择性相关联。 将选择性映射到计划中确定查询执行计划。 确定的查询执行计划被包括在增强的训练点集合中,其中增强集合包括初始集合和新集合。

    Method, system and program for optimizing compression of a workload processed by a database management system
    10.
    发明授权
    Method, system and program for optimizing compression of a workload processed by a database management system 有权
    用于优化由数据库管理系统处理的工作负载的压缩的方法,系统和程序

    公开(公告)号:US07281004B2

    公开(公告)日:2007-10-09

    申请号:US10788583

    申请日:2004-02-27

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3046 Y10S707/99934

    摘要: The present invention provides a method, system and program for optimizing compression of a workload processed by a database management system. In an embodiment of the present invention a method of optimizing the compression of database workloads is provided. Initially, an estimate of a cost of execution for each query according to a defined metric such as execution time or memory consumption is determined. A sub-set of queries is then selected from the workload in order of the most costly to least costly relative to the defined metric for compression according to either a predetermined compression threshold percentage or a threshold percentage derived from an allotted workload execution time. Compression is then performed on the selected sub-set of queries (i.e. those that will benefit the most from the compression) to achieve a net beneficial trade-off between the cost of workload compression and the cost of workload execution.

    摘要翻译: 本发明提供一种用于优化由数据库管理系统处理的工作负载的压缩的方法,系统和程序。 在本发明的实施例中,提供了优化数据库工作负载的压缩的方法。 首先,确定根据诸如执行时间或存储器消耗的定义的度量的每个查询的执行成本的估计。 然后根据预定的压缩阈值百分比或从分配的工作负载执行时间导出的阈值百分比,从工作负荷中选择相对于定义的压缩度量最贵的成本最低的成本的子集。 然后对所选择的查询子集(即将从压缩中获益最多的那些)执行压缩,以实现工作负载压缩成本与工作负载执行成本之间的净有益权衡。