Query routing based on feature learning of data sources
    1.
    发明授权
    Query routing based on feature learning of data sources 有权
    基于数据源特征学习的查询路由

    公开(公告)号:US06886009B2

    公开(公告)日:2005-04-26

    申请号:US10209112

    申请日:2002-07-31

    IPC分类号: G06F17/30

    摘要: Query routing is based on identifying the preeminent search systems and data sources for each of a number of information domains. This involves assigning a weight to each search system or data source for each of the information domains. The greater the weight, the more preeminent a search system or data source is in a particular information domain. These weights Wi{1=0, 1,2, . . . N] are computed through a recursive learning process employing meta processing. The meta learning process involves simultaneous interrogation of multiple search systems to take advantage of the cross correlation between the search systems and data sources. In this way, assigning a weight to a search system takes into consideration results obtained about other search systems so that the assigned weights reflect the relative strengths of each of the systems or sources in a particular information domain. In the present process, a domain dataset used as an input to query generator. The query generator extracts keywords randomly from the domain dataset. Sets of the extracted keywords constitute a domain specific search query. The query is submitted to the multiple search systems or sources to be evaluated. Initially, a random average weight is assigned to each search system or source. Then, the meta learning process recursively evaluates the search results and feeds back a weight correction dWi to be applied to each system or source server by using weight difference calculator. After a certain number of iterations, the weights Wi reach stable values. These stable values are the values assigned to the search system under evaluation. When searches are performed, the weights are used to determine search systems or sources that are interrogated.

    摘要翻译: 查询路由是基于为多个信息域中的每一个标识优秀的搜索系统和数据源。 这涉及为每个信息域的每个搜索系统或数据源分配权重。 权重越大,搜索系统或数据源在特定信息域中越是优秀。 这些权重Wi {1 = 0,1,2,... 。 。 N]通过使用元处理的递归学习过程来计算。 元学习过程包括同时询问多个搜索系统,以利用搜索系统和数据源之间的互相关。 以这种方式,向搜索系统分配权重考虑了关于其他搜索系统获得的结果,使得分配的权重反映了特定信息域中的每个系统或源的相对强度。 在本过程中,用作查询生成器的输入的域数据集。 查询生成器从域数据集中随机提取关键字。 所提取的关键字的集合构成域特定的搜索查询。 该查询被提交给要评估的多个搜索系统或源。 最初,随机平均权重被分配给每个搜索系统或源。 然后,元学习处理递归地评估搜索结果,并且通过使用权重差计算器反馈要应用于每个系统或源服务器的权重校正dWi。 经过一定次数的迭代,重量Wi达到稳定值。 这些稳定值是分配给正在评估的搜索系统的值。 当执行搜索时,权重用于确定被询问的搜索系统或源。

    Optimization of server selection using euclidean analysis of search terms
    2.
    发明授权
    Optimization of server selection using euclidean analysis of search terms 失效
    优化服务器选择使用欧几里得分析搜索词

    公开(公告)号:US07143085B2

    公开(公告)日:2006-11-28

    申请号:US10209619

    申请日:2002-07-31

    IPC分类号: G06F7/00

    摘要: Euclidean analysis is used to define queries in terms of a multi-axis query space where each of the keywords T1, T2, . . . Ti, . . . Tn is assigned an axis in that space. Sets of test queries St for each one from one of a plurality of server sources, are plotted in the query space. Clusters of the search terms are identified based on the proximity of the plotted query vectors to one another. Predominant servers are identified for each of the clusters. When a search query Ss is received, the location of its vector is determined and the servers accessed by the search query Ss are those that are predominant in the cluster which its vector may fall or is in closest proximity to.

    摘要翻译: 欧几里德分析用于根据多轴查询空间来定义查询,其中每个关键字T 1,T 2,...。 。 。 。。。。。。。。。。。。 。 。 在该空间中分配了一个轴。 在多个服务器源中的一个服务器源中的每​​一个的测试查询集合被绘制在查询空间中。 基于绘制的查询向量彼此的邻近度来识别搜索项的群集。 为每个集群标识主要服务器。 当接收到搜索查询S 时,确定其向量的位置,并且由搜索查询S 访问的服务器是在群集中占主导地位的那些 矢量可能会下降或最接近。

    Retrieving matching documents by queries in any national language
    3.
    发明授权
    Retrieving matching documents by queries in any national language 有权
    通过任何国家语言的查询检索匹配文档

    公开(公告)号:US07260570B2

    公开(公告)日:2007-08-21

    申请号:US10180195

    申请日:2002-06-26

    IPC分类号: G06F17/00 G06F7/00

    摘要: Search time is reduced with a search engine that includes a bi-directional inverted index facility which can be accessed with a keyword search in one of a number of languages and provide a listing of documents contained in all of those languages. The keywords in all supported languages are preferably stored in an inverted index lookup table cross referenced to documents in those language containing the keywords. Keywords with the same meaning in different languages are accessible together when that keyword in one of the languages is queried. The search engine containing the table can identify pertinent documents either in a selected language, a second language or in all supported languages, as determined by the user. Information about each document can include not only the identity of the document but also information used in ranking the documents such as the number of times that a keyword appears in that document, and the keywords proximity to other keywords. The use of the inverted index table therefore reduces search time by eliminating the need for translation of keywords, their identification in documents and accumulating of ranking information at search runtime and avoids inaccuracies which may result from full text translations of documents.

    摘要翻译: 搜索时间减少,搜索引擎包括双向倒排索引设施,可以通过关键字搜索以多种语言之一进行访问,并提供所有这些语言中包含的文档列表。 所有支持的语言中的关键字优选地存储在与包含关键词的那些语言中的文档相对应的反向索引查找表中。 当查询其中一种语言的关键字时,具有不同语言含义相同的关键字可以一起访问。 包含该表的搜索引擎可以以用户所确定的所选语言,第二语言或所有支持的语言来识别相关文档。 关于每个文档的信息不仅可以包括文档的身份,而且还可以包括用于对文档进行排名的信息,例如关键字在该文档中出现的次数以及关键字与其他关键字的接近度。 因此,使用反向索引表减少了搜索时间,消除了对关键字的翻译,文档识别和搜索运行时积累排名信息的需要,并避免了文档全文翻译可能导致的不准确之处。