发明授权
- 专利标题: Document characterization using a tensor space model
- 专利标题(中): 文档表征使用张量空间模型
-
申请号: US11378095申请日: 2006-03-17
-
公开(公告)号: US07529719B2公开(公告)日: 2009-05-05
- 发明人: Ning Liu , Benyu Zhang , Jun Yan , Zheng Chen , Hua-Jun Zeng , Jian Wang
- 申请人: Ning Liu , Benyu Zhang , Jun Yan , Zheng Chen , Hua-Jun Zeng , Jian Wang
- 申请人地址: US WA Redmond
- 专利权人: Microsoft Corporation
- 当前专利权人: Microsoft Corporation
- 当前专利权人地址: US WA Redmond
- 代理机构: Perkins Coie LLP
- 主分类号: G06N5/00
- IPC分类号: G06N5/00
摘要:
Computer-readable media having computer-executable instructions and apparatuses categorize documents or corpus of documents. A Tensor Space Model (TSM), which models the text by a higher-order tensor, represents a document or a corpus of documents. Supported by techniques of multilinear algebra, TSM provides a framework for analyzing the multifactor structures. TSM is further supported by operations and presented tools, such as the High-Order Singular Value Decomposition (HOSVD) for a reduction of the dimensions of the higher-order tensor. The dimensionally reduced tensor is compared with tensors that represent possible categories. Consequently, a category is selected for the document or corpus of documents. Experimental results on the dataset for 20 Newsgroups suggest that TSM is advantageous to a Vector Space Model (VSM) for text classification.
公开/授权文献
- US20070239643A1 Document characterization using a tensor space model 公开/授权日:2007-10-11
信息查询
IPC分类:
G | 物理 |
G06 | 计算;推算或计数 |
G06N | 基于特定计算模型的计算机系统 |
G06N5/00 | 利用基于知识的模式的计算机系统 |