发明授权
US07945525B2 Methods for obtaining improved text similarity measures which replace similar characters with a string pattern representation by using a semantic data tree 失效
用于通过使用语义数据树获得用字符串模式表示替换相似字符的改进的文本相似性度量的方法

Methods for obtaining improved text similarity measures which replace similar characters with a string pattern representation by using a semantic data tree
摘要:
The embodiments of the invention provide methods for obtaining improved text similarity measures. More specifically, a method of measuring similarity between at least two electronic documents begins by identifying similar terms between the electronic documents. This includes basing similarity between the similar terms on patterns, wherein the patterns can include word patterns, letter patterns, numeric patterns, and/or alphanumeric patterns. The identifying of the similar terms also includes identifying multiple pattern types between the electronic documents. Moreover, the basing of the similarity on patterns identifies terms within the electronic documents that are within a category of a hierarchy. Specifically, the identifying of the terms reviews a hierarchical data tree, wherein nodes of the tree represent terms within the electronic documents. Lower nodes of the tree have specific terms; and, wherein higher nodes of the tree have general terms.
公开/授权文献
信息查询
0/0