-
公开(公告)号:US20040088157A1
公开(公告)日:2004-05-06
申请号:US10283652
申请日:2002-10-30
Applicant: Motorola, Inc.
Inventor: Lawrence E. Lach , Thomas Michael Tirpak , Maria B. Thompson
IPC: G06F017/27
CPC classification number: G06F17/2785
Abstract: Textual documents are readily classified and/or characterized with respect to other documents by determining a corresponding level of semantic distance between such documents. For example, particular parts of speech are identified, and those words in the documents that correspond to such parts of speech are identified and extracted. Matches of such wording between the documents permit identification of a given corresponding semantic distance value. When no matches occur (or when otherwise desired), synonyms for such words can be used to ascertain more distant semantic relationships. The process can be repeated in an iterative fashion using ever-deepening tiers of synonyms.
Abstract translation: 文本文件通过确定这些文档之间的相应级别的语义距离,可以容易地对其他文档进行分类和/或表征。 例如,识别特定的语音部分,并且识别和提取对应于这些语音部分的文档中的那些单词。 文件之间的这种措辞的匹配允许识别给定的对应语义距离值。 当不发生任何匹配(或者如果不需要)时,可以使用这样的单词的同义词来确定更远的语义关系。 该过程可以使用不断深化的同义词层次以迭代方式重复。