发明授权
US06473754B1 METHOD AND SYSTEM FOR EXTRACTING CHARACTERISTIC STRING, METHOD AND SYSTEM FOR SEARCHING FOR RELEVANT DOCUMENT USING THE SAME, STORAGE MEDIUM FOR STORING CHARACTERISTIC STRING EXTRACTION PROGRAM, AND STORAGE MEDIUM FOR STORING RELEVANT DOCUMENT SEARCHING PROGRAM
有权
用于提取特征字符串的方法和系统,用于搜索相关文档的方法和系统,用于存储特征字符串提取程序的存储媒体和存储相关文档搜索程序的存储介质
- 专利标题: METHOD AND SYSTEM FOR EXTRACTING CHARACTERISTIC STRING, METHOD AND SYSTEM FOR SEARCHING FOR RELEVANT DOCUMENT USING THE SAME, STORAGE MEDIUM FOR STORING CHARACTERISTIC STRING EXTRACTION PROGRAM, AND STORAGE MEDIUM FOR STORING RELEVANT DOCUMENT SEARCHING PROGRAM
- 专利标题(中): 用于提取特征字符串的方法和系统,用于搜索相关文档的方法和系统,用于存储特征字符串提取程序的存储媒体和存储相关文档搜索程序的存储介质
-
申请号: US09320558申请日: 1999-05-27
-
公开(公告)号: US06473754B1公开(公告)日: 2002-10-29
- 发明人: Tadataka Matsubayashi , Katsumi Tada , Takuya Okamoto , Natsuko Sugaya , Yasushi Kawashimo
- 申请人: Tadataka Matsubayashi , Katsumi Tada , Takuya Okamoto , Natsuko Sugaya , Yasushi Kawashimo
- 优先权: JP10-148721 19980529
- 主分类号: G06F1730
- IPC分类号: G06F1730
摘要:
A method for extracting features in contents of a document without using a word dictionary and a system using the method for accurately searching for a relevant document or documents at high speed. The method includes steps of storing character strings present in a text in a text database and possibilities appearing at boundaries of words in the text in the form of an occurrence probability file, storing occurrence frequencies of the character strings in the text as an occurrence frequency file, extracting characteristic strings from a text spcified by a user with use of the occurrence probability file, and counting occurrence frequencies thereof in the user-specified text. The method calculates similarities to the user-specified text with use of the occurrence frequency file and the occurrence frequencies in the user-specified text.
信息查询