Selectivity estimation of set similarity selection queries
    1.
    发明授权
    Selectivity estimation of set similarity selection queries 失效
    集合相似性选择查询的选择性估计

    公开(公告)号:US08161046B2

    公开(公告)日:2012-04-17

    申请号:US12274546

    申请日:2008-11-20

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30469

    摘要: The invention relates to a system and/or methodology for selectivity estimation of set similarity queries. More specifically, the invention relates to a selectivity estimation technique employing hashed sampling. The invention providing for samples constructed a priori that can efficiently and quickly provide accurate estimates for arbitrary queries, and can be updated efficiently as well.

    摘要翻译: 本发明涉及用于组合相似性查询的选择性估计的系统和/或方法。 更具体地,本发明涉及采用散列采样的选择性估计技术。 本发明提供了可以有效地和快速地为任意查询提供准确估计的先验构建的样本,并且还可以有效地更新。

    SELECTIVITY ESTIMATION OF SET SIMILARITY SELECTION QUERIES
    2.
    发明申请
    SELECTIVITY ESTIMATION OF SET SIMILARITY SELECTION QUERIES 失效
    选择性相似性选择问题的选择性估计

    公开(公告)号:US20100125559A1

    公开(公告)日:2010-05-20

    申请号:US12274546

    申请日:2008-11-20

    IPC分类号: G06F7/06 G06F17/30

    CPC分类号: G06F17/30469

    摘要: The invention relates to a system and/or methodology for selectivity estimation of set similarity queries. More specifically, the invention relates to a selectivity estimation technique employing hashed sampling. The invention providing for samples constructed a priori that can efficiently and quickly provide accurate estimates for arbitrary queries, and can be updated efficiently as well.

    摘要翻译: 本发明涉及用于组合相似性查询的选择性估计的系统和/或方法。 更具体地,本发明涉及采用散列采样的选择性估计技术。 本发明提供了可以有效地和快速地为任意查询提供准确估计的先验构建的样本,并且还可以有效地更新。

    Set Similarity selection queries at interactive speeds
    3.
    发明申请
    Set Similarity selection queries at interactive speeds 有权
    以交互式速度设置相似性选择查询

    公开(公告)号:US20090171944A1

    公开(公告)日:2009-07-02

    申请号:US12006332

    申请日:2008-01-02

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30442

    摘要: The similarity between a query set comprising query set tokens and a database set comprising database set tokens is determined by a similarity score. The database sets belong to a data collection set, which contains all database sets from which information may be retrieved. If the similarity score is greater than or equal to a user-defined threshold, the database set has information relevant to the query set. The similarity score is calculated with an inverse document frequency method (IDF) similarity measure independent of term frequency. The document frequency is based at least in part on the number of database sets in the data collection set and the number of database sets which contain at least one query set token. The length of the query set and the length of the database set are normalized.

    摘要翻译: 包括查询集令牌的查询集和包括数据库集令牌的数据库集之间的相似性由相似性得分确定。 数据库集合属于数据集合集,其中包含可从中检索信息的所有数据库集。 如果相似性得分大于或等于用户定义的阈值,则数据库集合具有与查询集相关的信息。 相似性得分用独立于术语频率的逆文档频率法(IDF)相似性度量计算。 文档频率至少部分地基于数据收集集中的数据库集合的数量以及包含至少一个查询集令牌的数据库集合的数量。 查询集的长度和数据库集的长度被归一化。

    Incremental Maintenance of Inverted Indexes for Approximate String Matching
    4.
    发明申请
    Incremental Maintenance of Inverted Indexes for Approximate String Matching 有权
    反向索引的近似字符串匹配的增量维护

    公开(公告)号:US20120323870A1

    公开(公告)日:2012-12-20

    申请号:US13595270

    申请日:2012-08-27

    IPC分类号: G06F17/30

    摘要: In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.

    摘要翻译: 在所公开的技术的实施例中,诸如反向索引之类的索引仅在必要时被更新以保证在与索引本身的更新相比较较少成本的预定阈值内的应答精度。 使用本技术,可以在几分钟内处理一批每日更新,而不是几个小时来重建索引,并且可以回答保证结果准确或准确的阈值。

    Incremental Maintenance of Inverted Indexes for Approximate String Matching
    5.
    发明申请
    Incremental Maintenance of Inverted Indexes for Approximate String Matching 失效
    反向索引的近似字符串匹配的增量维护

    公开(公告)号:US20100318519A1

    公开(公告)日:2010-12-16

    申请号:US12481693

    申请日:2009-06-10

    IPC分类号: G06F17/30

    摘要: In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.

    摘要翻译: 在所公开的技术的实施例中,诸如反向索引之类的索引仅在必要时被更新以保证在与索引本身的更新相比较较少成本的预定阈值内的应答精度。 使用本技术,可以在几分钟内处理一批每日更新,而不是几个小时来重建索引,并且可以回答保证结果准确或准确的阈值。

    Incremental maintenance of inverted indexes for approximate string matching
    6.
    发明授权
    Incremental maintenance of inverted indexes for approximate string matching 有权
    用于近似字符串匹配的反向索引的增量维护

    公开(公告)号:US09514172B2

    公开(公告)日:2016-12-06

    申请号:US13595270

    申请日:2012-08-27

    IPC分类号: G06F17/30

    摘要: In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.

    摘要翻译: 在所公开的技术的实施例中,诸如反向索引之类的索引仅在必要时被更新以保证在与索引本身的更新相比较较少成本的预定阈值内的应答精度。 使用本技术,可以在几分钟内处理一批每日更新,而不是几个小时来重建索引,并且可以回答保证结果准确或准确的阈值。

    Incremental maintenance of inverted indexes for approximate string matching
    7.
    发明授权
    Incremental maintenance of inverted indexes for approximate string matching 失效
    用于近似字符串匹配的反向索引的增量维护

    公开(公告)号:US08271499B2

    公开(公告)日:2012-09-18

    申请号:US12481693

    申请日:2009-06-10

    IPC分类号: G06F7/00

    摘要: In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.

    摘要翻译: 在所公开的技术的实施例中,诸如反向索引之类的索引仅在必要时被更新以保证在与索引本身的更新相比较较少成本的预定阈值内的应答精度。 使用本技术,可以在几分钟内处理一批每日更新,而不是几个小时来重建索引,并且可以回答保证结果准确或准确的阈值。

    Set similarity selection queries at interactive speeds
    8.
    发明授权
    Set similarity selection queries at interactive speeds 有权
    以交互式速度设置相似性选择查询

    公开(公告)号:US07921100B2

    公开(公告)日:2011-04-05

    申请号:US12006332

    申请日:2008-01-02

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30442

    摘要: The similarity between a query set comprising query set tokens and a database set comprising database set tokens is determined by a similarity score. The database sets belong to a data collection set, which contains all database sets from which information may be retrieved. If the similarity score is greater than or equal to a user-defined threshold, the database set has information relevant to the query set. The similarity score is calculated with an inverse document frequency method (IDF) similarity measure independent of term frequency. The document frequency is based at least in part on the number of database sets in the data collection set and the number of database sets which contain at least one query set token. The length of the query set and the length of the database set are normalized.

    摘要翻译: 包括查询集令牌的查询集和包括数据库集令牌的数据库集之间的相似性由相似性得分确定。 数据库集合属于数据集合集,其中包含可从中检索信息的所有数据库集。 如果相似性得分大于或等于用户定义的阈值,则数据库集合具有与查询集相关的信息。 相似性得分用独立于术语频率的逆文档频率法(IDF)相似性度量计算。 文档频率至少部分地基于数据收集集中的数据库集合的数量以及包含至少一个查询集令牌的数据库集合的数量。 查询集的长度和数据库集的长度被归一化。

    VERIFICATION OF OUTSOURCED DATA STREAMS
    9.
    发明申请
    VERIFICATION OF OUTSOURCED DATA STREAMS 有权
    外部数据流的验证

    公开(公告)号:US20100132036A1

    公开(公告)日:2010-05-27

    申请号:US12275879

    申请日:2008-11-21

    IPC分类号: G06F21/00 G06F17/30

    摘要: Embodiments disclosed herein are directed to verifying query results of an untrusted server. A data owner outsources a data stream to the untrusted server, which is configured to respond to a query from a client with the query result, which is returned to the client. The data owner can maintain a vector associated with query results returned by the server and can generate a verification synopsis using the vector and a seed. The verification synopsis includes a polynomial, where coefficients of the polynomial are determined based on the seed. The data owner outputs the verification synopsis and the seed to a client for verification of the query results.

    摘要翻译: 本文公开的实施例旨在验证不可信服务器的查询结果。 数据所有者将数据流外包给不受信任的服务器,该服务器被配置为响应来自具有查询结果的客户端的查询,该查询返回给客户端。 数据所有者可以维护与服务器返回的查询结果相关联的向量,并可以使用向量和种子生成验证概要。 验证概要包括多项式,其中基于种子确定多项式的系数。 数据所有者将验证概要和种子输出到客户端以验证查询结果。

    Verification of outsourced data streams
    10.
    发明授权
    Verification of outsourced data streams 有权
    验证外包数据流

    公开(公告)号:US08112802B2

    公开(公告)日:2012-02-07

    申请号:US12275879

    申请日:2008-11-21

    IPC分类号: G06F11/00

    摘要: Embodiments disclosed herein are directed to verifying query results of an untrusted server. A data owner outsources a data stream to the untrusted server, which is configured to respond to a query from a client with the query result, which is returned to the client. The data owner can maintain a vector associated with query results returned by the server and can generate a verification synopsis using the vector and a seed. The verification synopsis includes a polynomial, where coefficients of the polynomial are determined based on the seed. The data owner outputs the verification synopsis and the seed to a client for verification of the query results.

    摘要翻译: 本文公开的实施例旨在验证不可信服务器的查询结果。 数据所有者将数据流外包给不受信任的服务器,该服务器被配置为响应来自具有查询结果的客户端的查询,该查询返回给客户端。 数据所有者可以维护与服务器返回的查询结果相关联的向量,并可以使用向量和种子生成验证概要。 验证概要包括多项式,其中基于种子确定多项式的系数。 数据所有者将验证概要和种子输出到客户端以验证查询结果。