Method for performing a search of a plurality of documents for
similarity to a plurality of query words
    1.
    发明授权
    Method for performing a search of a plurality of documents for similarity to a plurality of query words 失效
    用于执行多个文档的搜索以与多个查询词相似的方法

    公开(公告)号:US5544049A

    公开(公告)日:1996-08-06

    申请号:US447317

    申请日:1995-05-22

    IPC分类号: G06F17/21 G06F17/30

    摘要: A method for performing a search of a plurality of documents for similarity to a query word includes retrieving a first document, and determining a number of occurrences of the at least one query word in the first document. Then, a next document is retrieved and a number of occurrences of the at least one query word in the next document is determined. The steps are repeated until each of the plurality of documents have been retrieved, and the number of occurrences of the at least one query word has been determined in each of the plurality of documents. The query word can include a plurality of query words, all of which are searched in each document, in turn, rather than being searched word by word in the whole collection of documents. The documents are then ranked according to the number of occurrences of the query words determined in each document, and a list of documents is produced according to the document ranking.

    摘要翻译: 执行与查询词相似的多个文档的搜索的方法包括检索第一文档以及确定第一文档中的至少一个查询词的出现次数。 然后,检索下一个文档,并确定下一个文档中的至少一个查询词的出现次数。 重复这些步骤,直到已经检索到多个文档中的每一个,并且已经在多个文档的每一个中确定了至少一个查询词的出现次数。 查询词可以包括多个查询词,所有这些查询词都是在每个文档中搜索,而不是在整个文档集合中逐句搜索。 然后根据在每个文档中确定的查询词的出现次数来排列文档,并且根据文档排名产生文档列表。

    Text retrieval method and system using signature of nearby words
    2.
    发明授权
    Text retrieval method and system using signature of nearby words 失效
    文本检索方法和使用附近字签名的系统

    公开(公告)号:US5542090A

    公开(公告)日:1996-07-30

    申请号:US280963

    申请日:1994-07-27

    IPC分类号: G06F17/30

    摘要: A method for searching a document corpus for query terms includes generating a list of document terms including a term signature for each term based upon characteristics of a number of adjacent terms. The term signatures can be generated by generating a bit vector for each term within a predetermined adjacent number of terms from each document term, such as through application of a hash function. The bit vectors can then be combined to form the term signature. The word signature alternatively can be generated using one or more morphological properties of the terms. The predetermined adjacent number of terms can be the number of search terms minus one, and may precede, follow, or both precede and follow the document term for which the term signature is generated. A search signature is generated for the query terms excluding a reference term, based upon the predetermined characteristics. The term signature of the reference term is compared with the search signature, and an indication is provided when the term signature of the reference term and the search signature match.

    摘要翻译: 用于搜索文档语料库以获取查询词语的方法包括基于多个相邻词语的特征来生成包括每个词语的术语签名的文档术语列表。 术语签名可以通过生成来自每个文档术语的预定相邻数目的术语中的每个项的位向量,例如通过应用散列函数来生成。 然后可以将位向量组合以形成术语签名。 可以使用术语的一个或多个形态属性来生成词签名。 预定相邻数量的项可以是搜索项的数量减1,并且可以在产生术语签名的文档术语之前,之后或之前。 基于预定特征,为排除参考项的查询项生成搜索签名。 将参考项的术语签名与搜索签名进行比较,并且当参考项和搜索签名的术语签名匹配时提供指示。