Method and system of routing messages in a distributed search network
    1.
    发明授权
    Method and system of routing messages in a distributed search network 有权
    在分布式搜索网络中路由消息的方法和系统

    公开(公告)号:US06934702B2

    公开(公告)日:2005-08-23

    申请号:US10106604

    申请日:2002-03-26

    IPC分类号: G06F17/30

    摘要: A system and method for distributing search requests in a network. The system and method may also route search responses. Network nodes operating as consumer or requesting nodes generate the search requests. Nodes operating as hubs are configured to route the search requests in the network. Individual nodes operating as provider nodes receive the search request and in response may generate search results according to their own procedures and return them. Communication between nodes in the network may use a common query protocol. Hub nodes may resolve the search requests to a subset of the provider nodes in the network, for example by matching search requests with registration information from nodes. Search results may be customized at various stages in the network.

    摘要翻译: 一种用于在网络中分发搜索请求的系统和方法。 系统和方法也可以路由搜索响应。 作为消费者或请求节点运行的网络节点生成搜索请求。 用作集线器的节点被配置为在网络中路由搜索请求。 作为提供商节点运行的单个节点接收搜索请求,并且响应可以根据其自己的过程生成搜索结果并返回它们。 网络中的节点之间的通信可以使用公共查询协议。 集线器节点可以将搜索请求解析到网络中的提供商节点的子集,例如通过将搜索请求与来自节点的注册信息相匹配。 搜索结果可以在网络的各个阶段进行定制。

    System and method for improving the ranking of information retrieval
results for short queries
    2.
    发明授权
    System and method for improving the ranking of information retrieval results for short queries 失效
    用于提高短查询信息检索结果排名的系统和方法

    公开(公告)号:US5870740A

    公开(公告)日:1999-02-09

    申请号:US719816

    申请日:1996-09-30

    IPC分类号: G06F17/30

    摘要: A method and system for retrieving information in response to a query by a user. The method includes the steps of receiving a signal s having a value corresponding to a relevance-ranking algorithm score of a retrieved document, receiving a signal q having a value corresponding to the number of words in the query and a signal v having a value corresponding to the coordination level of the retrieved document and query (i.e., the degree of overlap between the document terms and the query terms), and generating an adjusted score s1 dependent on the signal s, the signal q and the signal v. The adjusted score s1 takes the coordination level into account for small values of q and gradually decreases the importance of the coordination level as q increases. The system of this invention includes a computer-based system for carrying out the method of this invention.

    摘要翻译: 用于响应于用户的查询来检索信息的方法和系统。 该方法包括以下步骤:接收具有对应于检索到的文档的相关性排序算法得分的值的信号s,接收具有与查询中的字数相对应的值的信号q和具有对应值的信号v 到所检索的文档和查询的协调级别(即,文档项与查询项之间的重叠程度),并根据信号s,信号q和信号v产生调整得分s1。调整得分 s1考虑到q的小值的协调水平,并随着q的增加逐渐降低协调水平的重要性。 本发明的系统包括用于实施本发明的方法的基于计算机的系统。

    Method of constant interaction-time clustering applied to document
browsing
    3.
    发明授权
    Method of constant interaction-time clustering applied to document browsing 失效
    不断的交互时间聚类方法应用于文档浏览

    公开(公告)号:US5483650A

    公开(公告)日:1996-01-09

    申请号:US79292

    申请日:1993-06-21

    IPC分类号: G06F17/30

    摘要: Arbitrarily large document collections are processed by expanding a focus set having at least one initial metadocument into a plurality of subsequent metadocuments. The number of subsequent metadocuments is approximately equal to a predetermined maximum number. The subsequent metadocuments are then clustered into a predetermined number of new metadocuments, which are summarized and presented to a user. The focus set is redefined to include only user-selected new metadocuments.

    摘要翻译: 通过将具有至少一个初始元文件的焦点集扩展到多个后续元文件来处理任意大的文档集合。 随后的元文件数量大约等于预定的最大数量。 随后的元文件然后被聚集成预定数量的新的元文件,其被汇总并呈现给用户。 焦点集被重新定义为仅包括用户选择的新的元文件。

    Scatter-gather: a cluster-based method and apparatus for browsing large
document collections
    4.
    发明授权
    Scatter-gather: a cluster-based method and apparatus for browsing large document collections 失效
    散点收集:用于浏览大型文档集合的基于群集的方法和设备

    公开(公告)号:US5442778A

    公开(公告)日:1995-08-15

    申请号:US790316

    申请日:1991-11-12

    IPC分类号: G06F17/30

    摘要: Scatter-Gather is a computer based document browsing method which operates in time proportional to a number of documents in a target corpus. The Scatter-Gather method includes: preparing an initial ordering of the corpus using, for example, an off-line computational method; determining a summary of the initial ordering of the corpus for interactive utility; and providing a further ordering of the corpus using, for example, an on-line non-deterministic method. The step of an off-line preparation of an initial ordering of a corpus is non-time-dependent, thus an accurate initial ordering is prepared. The step of determining a summary includes determining a summary for presentation to a user without scrolling on a CRT. The step of providing a further ordering includes truncated group average agglomerate clustering, merging disjointed document sets, center finding, assign-to-nearest and other refinement methods.

    摘要翻译: Scatter-Gather是一种基于计算机的文档浏览方法,与目标语料库中的文档数量成正比。 分散收集方法包括:使用例如离线计算方法来准备语料库的初始排序; 确定用于交互式实用程序的语料库的初始排序的摘要; 并使用例如在线非确定性方法提供语料库的进一步排序。 离线准备语料库的初始排序的步骤是非时间依赖的,因此准备了准确的初始排序。 确定摘要的步骤包括确定用于呈现给用户的摘要,而不在CRT上滚动。 提供进一步排序的步骤包括截断组平均聚集聚类,合并不相关文档集,中心查找,分配到最近和其他细化方法。

    Iterative technique for phrase query formation and an information
retrieval system employing same
    5.
    发明授权
    Iterative technique for phrase query formation and an information retrieval system employing same 失效
    用于短语查询形成的迭代技术和采用它的信息检索系统

    公开(公告)号:US5278980A

    公开(公告)日:1994-01-11

    申请号:US745794

    申请日:1991-08-16

    摘要: An information retrieval system and method are provided in which an operator inputs one or more query words which are used to determine a search key for searching through a corpus of documents, and which returns any matches between the search key and the corpus of documents as a phrase containing the word data matching the query word(s), a non-stop (content) word next adjacent to the matching word data, and all intervening stop-words between the matching word data and the next adjacent non-stop word. The operator, after reviewing one or more of the returned phrases can then use one or more of the next adjacent non-stop-words as new query words to reformulate the search key and perform a subsequent search through the document corpus. This process can be conducted iteratively, until the appropriate documents of interest are located. The additional non-stop-words from each phrase are preferably aligned with each other (e.g., by columnation) to ease viewing of the "new" content words.

    摘要翻译: 提供了一种信息检索系统和方法,其中操作者输入用于确定用于通过文档语料库搜索的搜索关键字的一个或多个查询词,并且将搜索关键字和文档语料库之间的任何匹配返回为 包含与查询字匹配的词数据,与匹配字数据相邻的不停(内容)字,以及匹配字数据与下一相邻不停字之间的所有中间停止字的短语。 操作者在查看一个或多个返回的短语之后,可以使用下一个相邻的非停止词中的一个或多个作为新的查询词来重新组合搜索关键字,并通过文档语料库执行后续搜索。 这个过程可以迭代进行,直到找到相关文档。 来自每个短语的附加非停止词优选彼此对齐(例如,通过列),以便于观看“新”内容词。

    Distributed information discovery through searching selected registered information providers
    6.
    发明授权
    Distributed information discovery through searching selected registered information providers 有权
    通过搜索选定的注册信息提供商进行分布式信息发现

    公开(公告)号:US07171415B2

    公开(公告)日:2007-01-30

    申请号:US09872360

    申请日:2001-05-31

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30867

    摘要: A distributed network search mechanism may be provided for consumers coupled to a network to search information providers coupled to the network. Consumers may make search requests according to a query routing protocol. A network hub may be configured to receive search requests from consumers. The hub may also receive registration requests from information providers according to the query routing protocol. Information providers register with the hub to indicate search queries in which they are interested in receiving. When a query request is received, the hub resolves the query request with a provider registration index. The hub matches search query information from the query request with provider registrations to determine which providers have registered to receive search queries like the current search query. The hub then routes the search query to matching providers according to the query routing protocol.

    摘要翻译: 可以为耦合到网络的用户提供分布式网络搜索机制以搜索耦合到网络的信息提供者。 消费者可以根据查询路由协议进行搜索请求。 网络中心可以被配置为从消费者接收搜索请求。 中心还可以根据查询路由协议从信息提供者接收注册请求。 信息提供商向集线器注册以指示他们有兴趣接收的搜索查询。 当接收到查询请求时,集线器使用提供商注册索引来解析查询请求。 中心将来自查询请求的搜索查询信息与提供商注册相匹配,以确定哪些提供商已经注册接收诸如当前搜索查询的搜索查询。 然后,中心根据查询路由协议将搜索查询路由到匹配提供者。

    Methods and apparatus for selecting semantically significant images in a
document image without decoding image content
    7.
    发明授权
    Methods and apparatus for selecting semantically significant images in a document image without decoding image content 失效
    在文件图像中选择语义有意义的图像而不对图像内容进行解码的方法和装置

    公开(公告)号:US5390259A

    公开(公告)日:1995-02-14

    申请号:US794191

    申请日:1991-11-19

    摘要: A method and apparatus for processing a document image, using a programmed general or special purpose computer, includes forming the image into image units, and at least one image unit classifier of at least one of the image units is determined, without decoding the content of the at least one of the image units. The classifier of the at least one of the image units is then compared with a classifier of another image unit. The classifier may be image unit length, width, location in the document, font, typeface, cross-section, the number of ascenders, the number of descenders, the average pixel density, the length of the top line contour, the length of the base contour, the location of image units with respect to neighboring image units, vertical position, horizontal inter-image unit spacing, and so forth. The classifier comparison can be a comparison with classifiers of image units of words in a reference table, or with classifiers of other image units in the document. Equivalent classes of image units can be generated, from which word frequency and significance can be determined. The image units can be determined by creating bounding boxes about identifiable segments or extractable units of the image, and can contain a word, a phrase, a letter, a number, a character, a glyph or the like.

    摘要翻译: 一种用于使用编程的通用或专用计算机处理文档图像的方法和装置,包括将图像形成为图像单元,并且确定至少一个图像单元的至少一个图像单元分类器,而不对 该至少一个图像单元。 然后将至少一个图像单元的分类器与另一图像单元的分类器进行比较。 分类器可以是图像单元长度,宽度,文档中的位置,字体,字体,横截面,上升数,下降数,平均像素密度,顶线轮廓的长度, 基本轮廓,图像单元相对于相邻图像单元的位置,垂直位置,水平图像间距等。 分类器比较可以是与参考表中的单词的图像单位的分类器或文档中的其他图像单元的分类器的比较。 可以生成等效的图像单位类别,从中可以确定字频率和重要性。 可以通过创建关于图像的可标识段或可提取单元的边界框来确定图像单元,并且可以包含单词,短语,字母,数字,字符,字形等。

    System and method for resolving distributed network search queries to information providers
    9.
    发明授权
    System and method for resolving distributed network search queries to information providers 有权
    将分布式网络搜索查询解析为信息提供者的系统和方法

    公开(公告)号:US06950821B2

    公开(公告)日:2005-09-27

    申请号:US10106601

    申请日:2002-03-26

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30867 Y10S707/959

    摘要: Systems and methods for resolving search queries to information providers in a distributing search network. In a network including nodes generating search requests and nodes providing information, a node may operate as hub to route search requests from requesting nodes to provider nodes. Providers may register with a network hub. Registration information may include address information and data indicating the queries or type of queries for which that provider may have relevant data. A hub may resolve search queries against provider registrations to determine a set of providers to which to route each search query. Several systems and methods of selecting some of the providers are described, including the use of bidding, ranking, and statistical data.

    摘要翻译: 用于将搜索查询解析到分发搜索网络中的信息提供者的系统和方法。 在包括生成搜索请求的节点和提供信息的节点的网络中,节点可以作为集线器来操作将请求节点的搜索请求路由到提供商节点。 提供商可以向网络中心注册。 注册信息可以包括地址信息和指示该提供者可能具有相关数据的查询或查询类型的数据。 中心可以解决针对提供商注册的搜索查询,以确定用于路由每个搜索查询的一组提供者。 描述了选择一些提供商的几种系统和方法,包括使用投标,排名和统计数据。