发明授权
US08046317B2 System and method of feature selection for text classification using subspace sampling
有权
使用子空间采样的文本分类的特征选择的系统和方法
- 专利标题: System and method of feature selection for text classification using subspace sampling
- 专利标题(中): 使用子空间采样的文本分类的特征选择的系统和方法
-
申请号: US12006178申请日: 2007-12-31
-
公开(公告)号: US08046317B2公开(公告)日: 2011-10-25
- 发明人: Anirban Dasgupta , Petros Drineas , Boulos Harb , Vanja Josifovski , Michael William Mahoney
- 申请人: Anirban Dasgupta , Petros Drineas , Boulos Harb , Vanja Josifovski , Michael William Mahoney
- 申请人地址: US CA Sunnyvale
- 专利权人: Yahoo! Inc.
- 当前专利权人: Yahoo! Inc.
- 当前专利权人地址: US CA Sunnyvale
- 代理机构: Buchenhorner Patent Law
- 主分类号: G06N5/00
- IPC分类号: G06N5/00
摘要:
An improved system and method is provided for feature selection for text classification using subspace sampling. A text classifier generator may be provided for selecting a small set of features using subspace sampling from the corpus of training data to train a text classifier for using the small set of features for classification of texts. To select the small set of features, a subspace of features from the corpus of training data may be randomly sampled according to a probability distribution over the set of features where a probability may be assigned to each of the features that is proportional to the square of the Euclidean norms of the rows of left singular vectors of a matrix of the features representing the corpus of training texts. The small set of features may classify texts using only the relevant features among a very large number of training features.
公开/授权文献
信息查询
IPC分类:
G | 物理 |
G06 | 计算;推算或计数 |
G06N | 基于特定计算模型的计算机系统 |
G06N5/00 | 利用基于知识的模式的计算机系统 |