System and method of feature selection for text classification using subspace sampling
    1.
    发明授权
    System and method of feature selection for text classification using subspace sampling 有权
    使用子空间采样的文本分类的特征选择的系统和方法

    公开(公告)号:US08046317B2

    公开(公告)日:2011-10-25

    申请号:US12006178

    申请日:2007-12-31

    IPC分类号: G06N5/00

    摘要: An improved system and method is provided for feature selection for text classification using subspace sampling. A text classifier generator may be provided for selecting a small set of features using subspace sampling from the corpus of training data to train a text classifier for using the small set of features for classification of texts. To select the small set of features, a subspace of features from the corpus of training data may be randomly sampled according to a probability distribution over the set of features where a probability may be assigned to each of the features that is proportional to the square of the Euclidean norms of the rows of left singular vectors of a matrix of the features representing the corpus of training texts. The small set of features may classify texts using only the relevant features among a very large number of training features.

    摘要翻译: 提供了一种改进的系统和方法,用于使用子空间采样进行文本分类的特征选择。 可以提供文本分类器生成器,用于使用来自训练数据语料库的子空间采样来选择一小组特征,以训练文本分类器以使用用于分类文本的小的特征集合。 为了选择一小组特征,可以根据训练数据语料库的特征的子空间根据特征集合上的概率分布来随机抽样,其中概率可以分配给与 表示训练文本语料库的特征矩阵的左奇异矢量行的欧几里得规范。 一小部分功能可以仅使用相当的特征来分类文本,这些功能包含大量的训练特征。

    System and method of feature selection for text classification using subspace sampling
    2.
    发明申请
    System and method of feature selection for text classification using subspace sampling 有权
    使用子空间采样的文本分类的特征选择的系统和方法

    公开(公告)号:US20090171870A1

    公开(公告)日:2009-07-02

    申请号:US12006178

    申请日:2007-12-31

    IPC分类号: G06F15/18

    摘要: An improved system and method is provided for feature selection for text classification using subspace sampling. A text classifier generator may be provided for selecting a small set of features using subspace sampling from the corpus of training data to train a text classifier for using the small set of features for classification of texts. To select the small set of features, a subspace of features from the corpus of training data may be randomly sampled according to a probability distribution over the set of features where a probability may be assigned to each of the features that is proportional to the square of the Euclidean norms of the rows of left singular vectors of a matrix of the features representing the corpus of training texts. The small set of features may classify texts using only the relevant features among a very large number of training features.

    摘要翻译: 提供了一种改进的系统和方法,用于使用子空间采样进行文本分类的特征选择。 可以提供文本分类器生成器,用于使用来自训练数据语料库的子空间采样来选择一小组特征,以训练文本分类器以使用用于分类文本的小的特征集合。 为了选择一小组特征,可以根据训练数据语料库的特征的子空间根据特征集合上的概率分布来随机抽样,其中概率可以分配给与 表示训练文本语料库的特征矩阵的左奇异矢量行的欧几里得规范。 一小部分功能可以仅使用相当的特征来分类文本,这些功能包含大量的训练特征。