-
公开(公告)号:US20050021508A1
公开(公告)日:2005-01-27
申请号:US10838231
申请日:2004-05-05
申请人: Tadataka Matsubayashi , Natsuko Sugaya , Michio Iijima , Yuichi Ogawa , Yuuki Watanabe , Shinya Yamamoto , Tsuyoshi Sudou
发明人: Tadataka Matsubayashi , Natsuko Sugaya , Michio Iijima , Yuichi Ogawa , Yuuki Watanabe , Shinya Yamamoto , Tsuyoshi Sudou
IPC分类号: G06F17/30
CPC分类号: G06F17/3069 , Y10S707/99933 , Y10S707/99935
摘要: Information that individual elements (characteristic character rings) indicative of characteristics of a registered document appear in the registered document is stored in advance. When calculating similarity of the registered document, a query designated by a searcher is analyzed. The query is represented by a characteristic vector having the individual elements which take the relation between a plurality of words into consideration. Pieces of appearance information of the individual words contained in the query are counted. The counted appearance information is compared with a searching index to calculate similarity between documents.
摘要翻译: 预先存储指示登记文件的特征的各个要素(特征字符环)出现在登记文件中的信息。 当计算登记文件的相似度时,分析由搜索者指定的查询。 该查询由具有考虑到多个单词之间的关系的各个单元的特征向量表示。 对查询中包含的各个单词的外观信息进行计数。 将计数的外观信息与搜索索引进行比较,以计算文档之间的相似度。
-
公开(公告)号:US07440938B2
公开(公告)日:2008-10-21
申请号:US10838231
申请日:2004-05-05
申请人: Tadataka Matsubayashi , Natsuko Sugaya , Michio Iijima , Yuichi Ogawa , Yuuki Watanabe , Shinya Yamamoto , Tsuyoshi Sudou
发明人: Tadataka Matsubayashi , Natsuko Sugaya , Michio Iijima , Yuichi Ogawa , Yuuki Watanabe , Shinya Yamamoto , Tsuyoshi Sudou
IPC分类号: G06F7/00
CPC分类号: G06F17/3069 , Y10S707/99933 , Y10S707/99935
摘要: Information that individual elements (characteristic character strings) indicative of characteristics of a registered document appear in the registered document is stored in advance. When calculating similarity of the registered document, a query designated by a searcher is analyzed. The query is represented by a characteristic vector having the individual elements which take the relation between a plurality of words into consideration. Pieces of appearance information of the individual words contained in the query are counted. The counted appearance information is compared with a searching index to calculate similarity between documents.
摘要翻译: 预先存储指示登记文件的特征的各个要素(特征字符串)出现在登记文件中的信息。 当计算登记文件的相似度时,分析由搜索者指定的查询。 该查询由具有考虑到多个单词之间的关系的各个单元的特征向量表示。 对查询中包含的各个单词的外观信息进行计数。 将计数的外观信息与搜索索引进行比较,以计算文档之间的相似度。
-