Set Similarity selection queries at interactive speeds
    1.
    发明申请
    Set Similarity selection queries at interactive speeds 有权
    以交互式速度设置相似性选择查询

    公开(公告)号:US20090171944A1

    公开(公告)日:2009-07-02

    申请号:US12006332

    申请日:2008-01-02

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30442

    摘要: The similarity between a query set comprising query set tokens and a database set comprising database set tokens is determined by a similarity score. The database sets belong to a data collection set, which contains all database sets from which information may be retrieved. If the similarity score is greater than or equal to a user-defined threshold, the database set has information relevant to the query set. The similarity score is calculated with an inverse document frequency method (IDF) similarity measure independent of term frequency. The document frequency is based at least in part on the number of database sets in the data collection set and the number of database sets which contain at least one query set token. The length of the query set and the length of the database set are normalized.

    摘要翻译: 包括查询集令牌的查询集和包括数据库集令牌的数据库集之间的相似性由相似性得分确定。 数据库集合属于数据集合集,其中包含可从中检索信息的所有数据库集。 如果相似性得分大于或等于用户定义的阈值,则数据库集合具有与查询集相关的信息。 相似性得分用独立于术语频率的逆文档频率法(IDF)相似性度量计算。 文档频率至少部分地基于数据收集集中的数据库集合的数量以及包含至少一个查询集令牌的数据库集合的数量。 查询集的长度和数据库集的长度被归一化。

    Set similarity selection queries at interactive speeds
    2.
    发明授权
    Set similarity selection queries at interactive speeds 有权
    以交互式速度设置相似性选择查询

    公开(公告)号:US07921100B2

    公开(公告)日:2011-04-05

    申请号:US12006332

    申请日:2008-01-02

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30442

    摘要: The similarity between a query set comprising query set tokens and a database set comprising database set tokens is determined by a similarity score. The database sets belong to a data collection set, which contains all database sets from which information may be retrieved. If the similarity score is greater than or equal to a user-defined threshold, the database set has information relevant to the query set. The similarity score is calculated with an inverse document frequency method (IDF) similarity measure independent of term frequency. The document frequency is based at least in part on the number of database sets in the data collection set and the number of database sets which contain at least one query set token. The length of the query set and the length of the database set are normalized.

    摘要翻译: 包括查询集令牌的查询集和包括数据库集令牌的数据库集之间的相似性由相似性得分确定。 数据库集合属于数据集合集,其中包含可从中检索信息的所有数据库集。 如果相似性得分大于或等于用户定义的阈值,则数据库集合具有与查询集相关的信息。 相似性得分用独立于术语频率的逆文档频率法(IDF)相似性度量计算。 文档频率至少部分地基于数据收集集中的数据库集合的数量以及包含至少一个查询集令牌的数据库集合的数量。 查询集的长度和数据库集的长度被归一化。

    Incremental Maintenance of Inverted Indexes for Approximate String Matching
    3.
    发明申请
    Incremental Maintenance of Inverted Indexes for Approximate String Matching 有权
    反向索引的近似字符串匹配的增量维护

    公开(公告)号:US20120323870A1

    公开(公告)日:2012-12-20

    申请号:US13595270

    申请日:2012-08-27

    IPC分类号: G06F17/30

    摘要: In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.

    摘要翻译: 在所公开的技术的实施例中,诸如反向索引之类的索引仅在必要时被更新以保证在与索引本身的更新相比较较少成本的预定阈值内的应答精度。 使用本技术,可以在几分钟内处理一批每日更新,而不是几个小时来重建索引,并且可以回答保证结果准确或准确的阈值。

    Incremental Maintenance of Inverted Indexes for Approximate String Matching
    4.
    发明申请
    Incremental Maintenance of Inverted Indexes for Approximate String Matching 失效
    反向索引的近似字符串匹配的增量维护

    公开(公告)号:US20100318519A1

    公开(公告)日:2010-12-16

    申请号:US12481693

    申请日:2009-06-10

    IPC分类号: G06F17/30

    摘要: In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.

    摘要翻译: 在所公开的技术的实施例中,诸如反向索引之类的索引仅在必要时被更新以保证在与索引本身的更新相比较较少成本的预定阈值内的应答精度。 使用本技术,可以在几分钟内处理一批每日更新,而不是几个小时来重建索引,并且可以回答保证结果准确或准确的阈值。

    Selectivity estimation of set similarity selection queries
    5.
    发明授权
    Selectivity estimation of set similarity selection queries 失效
    集合相似性选择查询的选择性估计

    公开(公告)号:US08161046B2

    公开(公告)日:2012-04-17

    申请号:US12274546

    申请日:2008-11-20

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30469

    摘要: The invention relates to a system and/or methodology for selectivity estimation of set similarity queries. More specifically, the invention relates to a selectivity estimation technique employing hashed sampling. The invention providing for samples constructed a priori that can efficiently and quickly provide accurate estimates for arbitrary queries, and can be updated efficiently as well.

    摘要翻译: 本发明涉及用于组合相似性查询的选择性估计的系统和/或方法。 更具体地,本发明涉及采用散列采样的选择性估计技术。 本发明提供了可以有效地和快速地为任意查询提供准确估计的先验构建的样本,并且还可以有效地更新。

    SELECTIVITY ESTIMATION OF SET SIMILARITY SELECTION QUERIES
    6.
    发明申请
    SELECTIVITY ESTIMATION OF SET SIMILARITY SELECTION QUERIES 失效
    选择性相似性选择问题的选择性估计

    公开(公告)号:US20100125559A1

    公开(公告)日:2010-05-20

    申请号:US12274546

    申请日:2008-11-20

    IPC分类号: G06F7/06 G06F17/30

    CPC分类号: G06F17/30469

    摘要: The invention relates to a system and/or methodology for selectivity estimation of set similarity queries. More specifically, the invention relates to a selectivity estimation technique employing hashed sampling. The invention providing for samples constructed a priori that can efficiently and quickly provide accurate estimates for arbitrary queries, and can be updated efficiently as well.

    摘要翻译: 本发明涉及用于组合相似性查询的选择性估计的系统和/或方法。 更具体地,本发明涉及采用散列采样的选择性估计技术。 本发明提供了可以有效地和快速地为任意查询提供准确估计的先验构建的样本,并且还可以有效地更新。

    Incremental maintenance of inverted indexes for approximate string matching
    7.
    发明授权
    Incremental maintenance of inverted indexes for approximate string matching 有权
    用于近似字符串匹配的反向索引的增量维护

    公开(公告)号:US09514172B2

    公开(公告)日:2016-12-06

    申请号:US13595270

    申请日:2012-08-27

    IPC分类号: G06F17/30

    摘要: In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.

    摘要翻译: 在所公开的技术的实施例中,诸如反向索引之类的索引仅在必要时被更新以保证在与索引本身的更新相比较较少成本的预定阈值内的应答精度。 使用本技术,可以在几分钟内处理一批每日更新,而不是几个小时来重建索引,并且可以回答保证结果准确或准确的阈值。

    Incremental maintenance of inverted indexes for approximate string matching
    8.
    发明授权
    Incremental maintenance of inverted indexes for approximate string matching 失效
    用于近似字符串匹配的反向索引的增量维护

    公开(公告)号:US08271499B2

    公开(公告)日:2012-09-18

    申请号:US12481693

    申请日:2009-06-10

    IPC分类号: G06F7/00

    摘要: In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.

    摘要翻译: 在所公开的技术的实施例中,诸如反向索引之类的索引仅在必要时被更新以保证在与索引本身的更新相比较较少成本的预定阈值内的应答精度。 使用本技术,可以在几分钟内处理一批每日更新,而不是几个小时来重建索引,并且可以回答保证结果准确或准确的阈值。

    System and method for managing data streams
    9.
    发明授权
    System and method for managing data streams 失效
    用于管理数据流的系统和方法

    公开(公告)号:US08117307B2

    公开(公告)日:2012-02-14

    申请号:US12605033

    申请日:2009-10-23

    IPC分类号: G06F13/00

    摘要: A system for a data stream management system includes a filter transport aggregate for a high speed input data stream with a plurality of packets each packet comprising attributes. The system includes an evaluation system to evaluate the high speed input data stream and partitions the packets into groups the attributes and a table, wherein the table stores the attributes of each packets using a hash function. A phantom query is used to define partitioned groups of packets using attributes other than those used to group the packets for solving user queries without performing the user queries on the high speed input data stream.

    摘要翻译: 用于数据流管理系统的系统包括具有多个分组的高速输入数据流的过滤器传输聚合,每个分组包括属性。 该系统包括一个评估系统,用于评估高速输入数据流,并将数据包划分成属性和一个表,其中该表使用散列函数存储每个数据包的属性。 虚幻查询用于使用不同于用于对用于分组用户查询的分组进行分组的属性来定义分组分组,而不对高速输入数据流执行用户查询。

    System and method for managing data streams
    10.
    发明授权
    System and method for managing data streams 有权
    用于管理数据流的系统和方法

    公开(公告)号:US07631074B1

    公开(公告)日:2009-12-08

    申请号:US11240518

    申请日:2005-09-30

    IPC分类号: G06F13/00

    摘要: A system for a data stream management system includes a filter transport aggregate for a high speed input data stream with a plurality of packets each packet comprising attributes. The system includes an evaluation system to evaluate the high speed input data stream and partitions the packets into groups the attributes and a table, wherein the table stores the attributes of each packets using a hash function. A phantom query is used to define partitioned groups of packets using attributes other than those used to group the packets for solving user queries without performing the user queries on the high speed input data stream.

    摘要翻译: 用于数据流管理系统的系统包括具有多个分组的高速输入数据流的过滤器传输聚合,每个分组包括属性。 该系统包括一个评估系统,用于评估高速输入数据流,并将数据包划分成属性和一个表,其中该表使用散列函数存储每个数据包的属性。 虚幻查询用于使用不同于用于对用于分组用户查询的分组进行分组的属性来定义分组分组,而不对高速输入数据流执行用户查询。