Invention Application
US20050210003A1 Sequence based indexing and retrieval method for text documents
审中-公开
文本文档的基于序列的索引和检索方法
- Patent Title: Sequence based indexing and retrieval method for text documents
- Patent Title (中): 文本文档的基于序列的索引和检索方法
-
Application No.: US10803478Application Date: 2004-03-17
-
Publication No.: US20050210003A1Publication Date: 2005-09-22
- Inventor: Yih-Kuen Tsay , Ching-Lin Yu , Yu-Fang Chen
- Applicant: Yih-Kuen Tsay , Ching-Lin Yu , Yu-Fang Chen
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/30

Abstract:
A sequence based indexing and retrieval method for a collection of text documents includes the steps of generating a query token sequence from a query; generating at least a representative token sequence from each of the documents that contain at least one token of the query token sequence; measuring a similarity between each of the representative token sequences and the query token sequence; and retrieving the text document in responsive to the similarity of the representative query token sequence with respect to the query token sequence. The similarity measurement is preformed by determining a token appearance score, a token order score, and a token consecutiveness score of the representative token sequence with respect to the query token sequence, so as to illustrate the similarity between the representative token sequence and the query token sequence for precisely and effectively retrieving the text document.
Information query