发明授权
- 专利标题: Text classification by weighted proximal support vector machine based on positive and negative sample sizes and weights
- 专利标题(中): 基于正,负样本大小和权重的加权近端支持向量机进行文本分类
-
申请号: US11384889申请日: 2006-03-20
-
公开(公告)号: US07707129B2公开(公告)日: 2010-04-27
- 发明人: Dong Zhuang , Benyu Zhang , Zheng Chen , Hua-Jun Zeng , Jian Wang
- 申请人: Dong Zhuang , Benyu Zhang , Zheng Chen , Hua-Jun Zeng , Jian Wang
- 申请人地址: US WA Redmond
- 专利权人: Microsoft Corporation
- 当前专利权人: Microsoft Corporation
- 当前专利权人地址: US WA Redmond
- 代理机构: Perkins Coie LLP
- 主分类号: G06F15/18
- IPC分类号: G06F15/18 ; G06E1/00 ; G06E3/00
摘要:
Embodiments of the invention relate to improvements to the support vector machine (SVM) classification model. When text data is significantly unbalanced (i.e., positive and negative labeled data are in disproportion), the classification quality of standard SVM deteriorates. Embodiments of the invention are directed to a weighted proximal SVM (WPSVM) model that achieves substantially the same accuracy as the traditional SVM model while requiring significantly less computational time. A weighted proximal SVM (WPSVM) model in accordance with embodiments of the invention may include a weight for each training error and a method for estimating the weights, which automatically solves the unbalanced data problem. And, instead of solving the optimization problem via the KKT (Karush-Kuhn-Tucker) conditions and the Sherman-Morrison-Woodbury formula, embodiments of the invention use an iterative algorithm to solve an unconstrained optimization problem, which makes WPSVM suitable for classifying relatively high dimensional data.
公开/授权文献
信息查询