Information theory based result merging for searching hierarchical entities across heterogeneous data sources
    1.
    发明授权
    Information theory based result merging for searching hierarchical entities across heterogeneous data sources 失效
    基于信息理论的结果合并,用于跨异构数据源搜索分层实体

    公开(公告)号:US08219552B2

    公开(公告)日:2012-07-10

    申请号:US12575210

    申请日:2009-10-07

    IPC分类号: G06F17/30

    摘要: A method, system, and computer program product are disclosed for merging search results. In one embodiment, the method comprises identifying a query, splitting the query into sub-queries, and calculating information content for each of the sub-queries. This method also comprises executing each of the sub-queries to obtain a plurality of search results, and combining the search results based on the information content calculated for the sub-queries. In an embodiment, the execution of each of the sub-queries includes identifying a multitude of search results for at least one of the sub-queries; and the combining includes grouping said multitude of search results into a plurality of clusters, and computing a relevance score for each of said clusters. In the embodiment the combining further includes merging the clusters based on the relevance scores computed for the clusters as well as the information content calculated for the sub-queries.

    摘要翻译: 公开了一种用于合并搜索结果的方法,系统和计算机程序产品。 在一个实施例中,该方法包括识别查询,将查询分割成子查询,以及计算每个子查询的信息内容。 该方法还包括执行每个子查询以获得多个搜索结果,并且基于为子查询计算的信息内容来组合搜索结果。 在一个实施例中,每个子查询的执行包括识别至少一个子查询的大量搜索结果; 并且所述组合包括将所述多个搜索结果分组为多个群集,以及计算每个所述群集的相关性得分。 在该实施例中,组合还包括基于针对集群计算的相关性分数以及为子查询计算的信息内容来合并集群。

    INFORMATION THEORY BASED RESULT MERGING FOR SEARCHING HIERARCHICAL ENTITIES ACROSS HETEROGENEOUS DATA SOURCES
    2.
    发明申请
    INFORMATION THEORY BASED RESULT MERGING FOR SEARCHING HIERARCHICAL ENTITIES ACROSS HETEROGENEOUS DATA SOURCES 失效
    基于异构数据源搜索分层实体的信息理论结果合并

    公开(公告)号:US20110082859A1

    公开(公告)日:2011-04-07

    申请号:US12575210

    申请日:2009-10-07

    IPC分类号: G06F17/30

    摘要: A method, system, and computer program product are disclosed for merging search results. In one embodiment, the method comprises identifying a query, splitting the query into sub-queries, and calculating information content for each of the sub-queries. This method also comprises executing each of the sub-queries to obtain a plurality of search results, and combining the search results based on the information content calculated for the sub-queries. In an embodiment, the execution of each of the sub-queries includes identifying a multitude of search results for at least one of the sub-queries; and the combining includes grouping said multitude of search results into a plurality of clusters, and computing a relevance score for each of said clusters. In the embodiment the combining further includes merging the clusters based on the relevance scores computed for the clusters as well as the information content calculated for the sub-queries.

    摘要翻译: 公开了一种用于合并搜索结果的方法,系统和计算机程序产品。 在一个实施例中,该方法包括识别查询,将查询分割成子查询,以及计算每个子查询的信息内容。 该方法还包括执行每个子查询以获得多个搜索结果,并且基于为子查询计算的信息内容来组合搜索结果。 在一个实施例中,每个子查询的执行包括识别至少一个子查询的大量搜索结果; 并且所述组合包括将所述多个搜索结果分组为多个群集,以及计算每个所述群集的相关性得分。 在该实施例中,组合还包括基于针对集群计算的相关性分数以及为子查询计算的信息内容来合并集群。

    Information theory based result merging for searching hierarchical entities across heterogeneous data sources
    3.
    发明授权
    Information theory based result merging for searching hierarchical entities across heterogeneous data sources 有权
    基于信息理论的结果合并,用于跨异构数据源搜索分层实体

    公开(公告)号:US09251208B2

    公开(公告)日:2016-02-02

    申请号:US13462995

    申请日:2012-05-03

    IPC分类号: G06F17/30

    摘要: A method, system, and computer program product are disclosed for merging search results. In one embodiment, the method comprises identifying a query, splitting the query into sub-queries, and calculating information content for each of the sub-queries. This method also comprises executing each of the sub-queries to obtain a plurality of search results, and combining the search results based on the information content calculated for the sub-queries. In an embodiment, the execution of each of the sub-queries includes identifying a multitude of search results for at least one of the sub-queries; and the combining includes grouping said multitude of search results into a plurality of clusters, and computing a relevance score for each of said clusters. In the embodiment the combining further includes merging the clusters based on the relevance scores computed for the clusters as well as the information content calculated for the sub-queries.

    摘要翻译: 公开了一种用于合并搜索结果的方法,系统和计算机程序产品。 在一个实施例中,该方法包括识别查询,将查询分割成子查询,以及计算每个子查询的信息内容。 该方法还包括执行每个子查询以获得多个搜索结果,并且基于为子查询计算的信息内容来组合搜索结果。 在一个实施例中,每个子查询的执行包括识别至少一个子查询的大量搜索结果; 并且所述组合包括将所述多个搜索结果分组为多个群集,以及计算每个所述群集的相关性得分。 在该实施例中,组合还包括基于针对集群计算的相关性分数以及为子查询计算的信息内容来合并集群。

    INFORMATION THEORY BASED RESULT MERGING FOR SEARCHING HIERARCHICAL ENTITIES ACROSS HETEROGENEOUS DATA SOURCES
    4.
    发明申请
    INFORMATION THEORY BASED RESULT MERGING FOR SEARCHING HIERARCHICAL ENTITIES ACROSS HETEROGENEOUS DATA SOURCES 审中-公开
    基于异构数据源搜索分层实体的信息理论结果合并

    公开(公告)号:US20120221542A1

    公开(公告)日:2012-08-30

    申请号:US13462995

    申请日:2012-05-03

    IPC分类号: G06F17/30

    摘要: A method, system, and computer program product are disclosed for merging search results. In one embodiment, the method comprises identifying a query, splitting the query into sub-queries, and calculating information content for each of the sub-queries. This method also comprises executing each of the sub-queries to obtain a plurality of search results, and combining the search results based on the information content calculated for the sub-queries. In an embodiment, the execution of each of the sub-queries includes identifying a multitude of search results for at least one of the sub-queries; and the combining includes grouping said multitude of search results into a plurality of clusters, and computing a relevance score for each of said clusters. In the embodiment the combining further includes merging the clusters based on the relevance scores computed for the clusters as well as the information content calculated for the sub-queries.

    摘要翻译: 公开了一种用于合并搜索结果的方法,系统和计算机程序产品。 在一个实施例中,该方法包括识别查询,将查询分割成子查询,以及计算每个子查询的信息内容。 该方法还包括执行每个子查询以获得多个搜索结果,并且基于为子查询计算的信息内容来组合搜索结果。 在一个实施例中,每个子查询的执行包括识别至少一个子查询的大量搜索结果; 并且所述组合包括将所述多个搜索结果分组为多个群集,以及计算每个所述群集的相关性得分。 在该实施例中,组合还包括基于针对集群计算的相关性分数以及为子查询计算的信息内容来合并集群。

    Using historical information to improve search across heterogeneous indices
    5.
    发明授权
    Using historical information to improve search across heterogeneous indices 有权
    使用历史信息来改进异构索引的搜索

    公开(公告)号:US08996561B2

    公开(公告)日:2015-03-31

    申请号:US12535330

    申请日:2009-08-04

    IPC分类号: G06F17/30

    摘要: A method, system and computer program product are disclosed for searching for data. In one embodiment, the invention provides a method comprising identifying a query and a search scope including a set of specified entities; and for each of these entities, estimating a number of documents that would be identified in a search through the entity to answer the query. On the basis of this estimating, a subset of the entities is formed. The query and this subset of entities are sent to a search engine to search the subset of entities to answer the query. In one embodiment, the estimating includes collecting statistical information from queries to build up a historical cache using heuristics or machine learning techniques, wherein the query includes a key word and a scope, and the historical cache contains a maximum number of returned results for an entity given the queries executed.

    摘要翻译: 公开了用于搜索数据的方法,系统和计算机程序产品。 在一个实施例中,本发明提供了一种方法,包括识别包括一组指定实体的查询和搜索范围; 并且对于这些实体中的每一个,估计将通过实体在搜索中识别的文档的数量以回答查询。 在该估计的基础上,形成实体的子集。 将查询和该实体子集发送到搜索引擎以搜索实体的子集以回答查询。 在一个实施例中,估计包括从查询收集统计信息以使用启发式或机器学习技术建立历史高速缓存,其中查询包括关键字和范围,并且历史高速缓存包含实体的最大返回结果数 给出执行的查询。

    Using historical information to improve search across heterogeneous indices
    6.
    发明授权
    Using historical information to improve search across heterogeneous indices 有权
    使用历史信息来改进异构索引的搜索

    公开(公告)号:US08909663B2

    公开(公告)日:2014-12-09

    申请号:US13435978

    申请日:2012-03-30

    IPC分类号: G06F17/30

    摘要: A method, system and computer program product are disclosed for searching for data. In one embodiment, the invention provides a method comprising identifying a query and a search scope including a set of specified entities; and for each of these entities, estimating a number of documents that would be identified in a search through the entity to answer the query. On the basis of this estimating, a subset of the entities is formed. The query and this subset of entities are sent to a search engine to search the subset of entities to answer the query. In one embodiment, the estimating includes collecting statistical information from queries to build up a historical cache using heuristics or machine learning techniques, wherein the query includes a key word and a scope, and the historical cache contains a maximum number of returned results for an entity given the queries executed.

    摘要翻译: 公开了用于搜索数据的方法,系统和计算机程序产品。 在一个实施例中,本发明提供了一种方法,包括识别包括一组指定实体的查询和搜索范围; 并且对于这些实体中的每一个,估计将通过实体在搜索中识别的文档的数量以回答查询。 在该估计的基础上,形成实体的子集。 将查询和该实体子集发送到搜索引擎以搜索实体的子集以回答查询。 在一个实施例中,估计包括从查询收集统计信息以使用启发式或机器学习技术建立历史高速缓存,其中查询包括关键词和范围,并且历史高速缓存包含用于实体的最大返回结果数 给出执行的查询。

    USING HISTORICAL INFORMATION TO IMPROVE SEARCH ACROSS HETEROGENEOUS INDICES
    7.
    发明申请
    USING HISTORICAL INFORMATION TO IMPROVE SEARCH ACROSS HETEROGENEOUS INDICES 有权
    使用历史信息改善异质性指标的搜索

    公开(公告)号:US20120191687A1

    公开(公告)日:2012-07-26

    申请号:US13435978

    申请日:2012-03-30

    IPC分类号: G06F17/30

    摘要: A method, system and computer program product are disclosed for searching for data. In one embodiment, the invention provides a method comprising identifying a query and a search scope including a set of specified entities; and for each of these entities, estimating a number of documents that would be identified in a search through the entity to answer the query. On the basis of this estimating, a subset of the entities is formed. The query and this subset of entities are sent to a search engine to search the subset of entities to answer the query. In one embodiment, the estimating includes collecting statistical information from queries to build up a historical cache using heuristics or machine learning techniques, wherein the query includes a key word and a scope, and the historical cache contains a maximum number of returned results for an entity given the queries executed.

    摘要翻译: 公开了用于搜索数据的方法,系统和计算机程序产品。 在一个实施例中,本发明提供了一种方法,包括识别包括一组指定实体的查询和搜索范围; 并且对于这些实体中的每一个,估计将通过实体在搜索中识别的文档的数量以回答查询。 在该估计的基础上,形成实体的子集。 将查询和该实体子集发送到搜索引擎以搜索实体的子集以回答查询。 在一个实施例中,估计包括从查询收集统计信息以使用启发式或机器学习技术建立历史高速缓存,其中查询包括关键词和范围,并且历史高速缓存包含用于实体的最大返回结果数 给出执行的查询。

    USING HISTORICAL INFORMATION TO IMPROVE SEARCH ACROSS HETEROGENEOUS INDICES
    8.
    发明申请
    USING HISTORICAL INFORMATION TO IMPROVE SEARCH ACROSS HETEROGENEOUS INDICES 有权
    使用历史信息改善异质性指标的搜索

    公开(公告)号:US20110035399A1

    公开(公告)日:2011-02-10

    申请号:US12535330

    申请日:2009-08-04

    IPC分类号: G06F17/30

    摘要: A method, system and computer program product are disclosed for searching for data. In one embodiment, the invention provides a method comprising identifying a query and a search scope including a set of specified entities; and for each of these entities, estimating a number of documents that would be identified in a search through the entity to answer the query. On the basis of this estimating, a subset of the entities is formed. The query and this subset of entities are sent to a search engine to search the subset of entities to answer the query. In one embodiment, the estimating includes collecting statistical information from queries to build up a historical cache using heuristics or machine learning techniques, wherein the query includes a key word and a scope, and the historical cache contains a maximum number of returned results for an entity given the queries executed.

    摘要翻译: 公开了用于搜索数据的方法,系统和计算机程序产品。 在一个实施例中,本发明提供了一种方法,包括识别包括一组指定实体的查询和搜索范围; 并且对于这些实体中的每一个,估计将通过实体在搜索中识别的文档的数量以回答查询。 在该估计的基础上,形成实体的子集。 将查询和该实体子集发送到搜索引擎以搜索实体的子集以回答查询。 在一个实施例中,估计包括从查询收集统计信息以使用启发式或机器学习技术建立历史高速缓存,其中查询包括关键词和范围,并且历史高速缓存包含用于实体的最大返回结果数 给出执行的查询。