Sequence based indexing and retrieval method for text documents
    1.
    发明申请
    Sequence based indexing and retrieval method for text documents 审中-公开
    文本文档的基于序列的索引和检索方法

    公开(公告)号:US20050210003A1

    公开(公告)日:2005-09-22

    申请号:US10803478

    申请日:2004-03-17

    CPC classification number: G06F16/334

    Abstract: A sequence based indexing and retrieval method for a collection of text documents includes the steps of generating a query token sequence from a query; generating at least a representative token sequence from each of the documents that contain at least one token of the query token sequence; measuring a similarity between each of the representative token sequences and the query token sequence; and retrieving the text document in responsive to the similarity of the representative query token sequence with respect to the query token sequence. The similarity measurement is preformed by determining a token appearance score, a token order score, and a token consecutiveness score of the representative token sequence with respect to the query token sequence, so as to illustrate the similarity between the representative token sequence and the query token sequence for precisely and effectively retrieving the text document.

    Abstract translation: 用于文本文档集合的基于序列的索引和检索方法包括以下步骤:从查询生成查询令牌序列; 从包含所述查询令牌序列的至少一个令牌的每个文档生成至少一个代表性令牌序列; 测量每个代表性令牌序列和查询令牌序列之间的相似度; 以及响应于所述代表性查询令牌序列与所述查询令牌序列的相似性来检索所述文本文档。 通过相对于查询令牌序列确定令牌外观得分,令牌顺序分数和代表性令牌序列的令牌连续性分数来执行相似度测量,以便说明代表性令牌序列与查询令牌之间的相似性 准确有效地检索文本文档的顺序。

Patent Agency Ranking