发明申请
US20070083509A1 Streaming text data mining method & apparatus using multidimensional subspaces
有权
使用多维子空间的流文本数据挖掘方法和装置
- 专利标题: Streaming text data mining method & apparatus using multidimensional subspaces
- 专利标题(中): 使用多维子空间的流文本数据挖掘方法和装置
-
申请号: US11246195申请日: 2005-10-11
-
公开(公告)号: US20070083509A1公开(公告)日: 2007-04-12
- 发明人: Yuan-Jye Wu , Anne Kao , Stephen Poteet , William Ferng , Robert Cranfill
- 申请人: Yuan-Jye Wu , Anne Kao , Stephen Poteet , William Ferng , Robert Cranfill
- 专利权人: The Boeing Company
- 当前专利权人: The Boeing Company
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
A streaming text data comparator performs real-time text data mining on streaming text data. The comparator receives a streaming text data document and generates a vector representation of the term frequencies relating to an existing document collection. The comparator then transforms the term frequency vector into a projection in a precomputed multidimensional subspace that represents the original document collection. The comparator further calculates a relationship value representing the similarities or differences between the vector representation and the subspace, and compares the relationship value to a predetermined threshold to determine whether the streaming text data document is related to the original document collection. If the streaming text data document is related, the streaming text data comparator intercalates the new document into the document collection. If the new document is not related, the comparator may store or delete the unrelated document.
公开/授权文献
信息查询