-
公开(公告)号:US07475048B2
公开(公告)日:2009-01-06
申请号:US10494876
申请日:2002-11-07
IPC分类号: G06F15/18
CPC分类号: G06K9/623 , G06K9/6269 , G06N99/005
摘要: A computer-implemented method is provided for ranking features within a large dataset containing a large number of features according to each feature's ability to separate data into classes. For each feature, a support vector machine separates the dataset into two classes and determines the margins between extremal points in the two classes. The margins for all of the features are compared and the features are ranked based upon the size of the margin, with the highest ranked features corresponding to the largest margins. A subset of features for classifying the dataset is selected from a group of the highest ranked features. In one embodiment, the method is used to identify the best genes for disease prediction and diagnosis using gene expression data from micro-arrays.
摘要翻译: 提供了一种计算机实现的方法,用于根据每个特征将数据分离成类的能力,对包含大量特征的大型数据集中的特征进行排名。 对于每个特征,支持向量机将数据集分为两类,并确定两类极值点之间的边距。 比较所有功能的边距,并根据边距的大小对特征进行排名,排名最高的功能对应于最大的边距。 从一组最高排名的特征中选择用于分类数据集的特征的子集。 在一个实施方案中,该方法用于使用来自微阵列的基因表达数据鉴定用于疾病预测和诊断的最佳基因。