- 专利标题: Method and system for clustering using generalized sentence patterns
-
申请号: US10880662申请日: 2004-06-30
-
公开(公告)号: US20060004561A1公开(公告)日: 2006-01-05
- 发明人: Benyu Zhang , Wei-Ying Ma , Zheng Chen , Hua-Jun Zeng
- 申请人: Benyu Zhang , Wei-Ying Ma , Zheng Chen , Hua-Jun Zeng
- 申请人地址: US WA Redmond
- 专利权人: Microsoft Corporation
- 当前专利权人: Microsoft Corporation
- 当前专利权人地址: US WA Redmond
- 主分类号: G06F17/28
- IPC分类号: G06F17/28
摘要:
A method and system for clustering documents based on generalized sentence patterns of the topics of the documents is provided. A generalized sentence patterns (“GSP”) system identifies a “sentence” that describes the topic of a document. To cluster documents, the GSP system generates a “generalized sentence” form of the sentence that describes the topic of each document. The generalized sentence is an abstraction of the words of the sentence. The GSP system identifies clusters of documents based on the patterns of their generalized sentences. The GSP system clusters documents when the generalized sentence representations of their topics have a similar pattern.