Methods for feature selection in a learning machine
    1.
    发明授权
    Methods for feature selection in a learning machine 有权
    学习机器中特征选择的方法

    公开(公告)号:US07624074B2

    公开(公告)日:2009-11-24

    申请号:US11929213

    申请日:2007-10-30

    IPC分类号: G06F15/18

    CPC分类号: G06K9/6231 G06N99/005

    摘要: In a pre-processing step prior to training a learning machine, pre-processing includes reducing the quantity of features to be processed using feature selection methods selected from the group consisting of recursive feature elimination (RFE), minimizing the number of non-zero parameters of the system (l0-norm minimization), evaluation of cost function to identify a subset of features that are compatible with constraints imposed by the learning set, unbalanced correlation score and transductive feature selection. The features remaining after feature selection are then used to train a learning machine for purposes of pattern classification, regression, clustering and/or novelty detection.

    摘要翻译: 在训练学习机之前的预处理步骤中,预处理包括使用从递归特征消除(RFE)中选出的特征选择方法来减少要处理的特征量的数量,使非零参数的数量最小化 (10-norm minimization),评估成本函数以识别与由学习集施加的约束兼容的特征的子集,不平衡相关得分和转换特征选择。 然后,特征选择之后剩余的特征用于训练学习机,用于模式分类,回归,聚类和/或新颖性检测。

    METHODS FOR FEATURE SELECTION IN A LEARNING MACHINE
    2.
    发明申请
    METHODS FOR FEATURE SELECTION IN A LEARNING MACHINE 有权
    方法选择学习机中的特征

    公开(公告)号:US20080215513A1

    公开(公告)日:2008-09-04

    申请号:US11929213

    申请日:2007-10-30

    IPC分类号: G06F15/18

    CPC分类号: G06K9/6231 G06N99/005

    摘要: In a pre-processing step prior to training a learning machine, pre-processing includes reducing the quantity of features to be processed using feature selection methods selected from the group consisting of recursive feature elimination (RFE), minimizing the number of non-zero parameters of the system (l0-norm minimization), evaluation of cost function to identify a subset of features that are compatible with constraints imposed by the learning set, unbalanced correlation score and transductive feature selection. The features remaining after feature selection are then used to train a learning machine for purposes of pattern classification, regression, clustering and/or novelty detection.

    摘要翻译: 在训练学习机之前的预处理步骤中,预处理包括使用从递归特征消除(RFE)中选出的特征选择方法来减少要处理的特征量的数量,使非零参数的数量最小化 系统的最小化(最小化),评估成本函数以识别与由学习集施加的约束兼容的特征的子集,不平衡相关得分和转换特征选择。 然后,特征选择之后剩余的特征用于训练学习机,用于模式分类,回归,聚类和/或新颖性检测。

    Methods for feature selection in a learning machine
    3.
    发明授权
    Methods for feature selection in a learning machine 有权
    学习机器中特征选择的方法

    公开(公告)号:US07318051B2

    公开(公告)日:2008-01-08

    申请号:US10478192

    申请日:2002-05-20

    IPC分类号: G06F15/18 G06E1/00 G06E3/00

    摘要: In a pre-processing step prior to training a learning machine, pre-processing includes reducing the quantity of features to be processed using feature selection methods selected from the group consisting of recursive feature elimination (RFE), minimizing the number of non-zero parameters of the system (lo-norm minimization), evaluation of cost function to identify a subset of features that are compatible with constraints imposed by the learning set, unbalanced correlation score and transductive feature selection. The features remaining after feature selection are then used to train a learning machine for purposes of pattern classification, regression, clustering and/or novelty detection. (FIG. 3, 300, 301, 302, 304, 306, 308, 309, 310, 311, 312, 314)

    摘要翻译: 在训练学习机之前的预处理步骤中,预处理包括使用从递归特征消除(RFE)中选出的特征选择方法来减少要处理的特征量的数量,使非零参数的数量最小化 的系统(最小化),评估成本函数以识别与由学习集施加的约束兼容的特征的子集,不平衡相关得分和转换特征选择。 然后,特征选择之后剩余的特征用于训练学习机,用于模式分类,回归,聚类和/或新颖性检测。 (图3),300,301,302,304,306,308,309,310,311,312,314,314,

    Method for feature selection and for evaluating features identified as significant for classifying data
    4.
    发明授权
    Method for feature selection and for evaluating features identified as significant for classifying data 有权
    用于特征选择和评估对分类数据有重要意义的特征的方法

    公开(公告)号:US07970718B2

    公开(公告)日:2011-06-28

    申请号:US12890705

    申请日:2010-09-26

    IPC分类号: G06F15/18

    摘要: A group of features that has been identified as “significant” in being able to separate data into classes is evaluated using a support vector machine which separates the dataset into classes one feature at a time. After separation, an extremal margin value is assigned to each feature based on the distance between the lowest feature value in the first class and the highest feature value in the second class. Separately, extremal margin values are calculated for a normal distribution within a large number of randomly drawn example sets for the two classes to determine the number of examples within the normal distribution that would have a specified extremal margin value. Using p-values calculated for the normal distribution, a desired p-value is selected. The specified extremal margin value corresponding to the selected p-value is compared to the calculated extremal margin values for the group of features. The features in the group that have a calculated extremal margin value less than the specified margin value are labeled as falsely significant.

    摘要翻译: 使用支持向量机将资源分为类别的“特征”组合进行评估,该支持向量机将数据集一次分为一个特征。 分离后,基于第一类中最低特征值与第二类中最高特征值之间的距离,为每个特征分配极值边缘值。 另外,对于两个类别的大量随机绘制的示例集合中的正态分布计算极值边界值,以确定具有指定的极值边界值的正态分布内的示例的数量。 使用为正态分布计算的p值,选择所需的p值。 对应于所选择的p值的指定极值余量值与所计算的特征组的极值边际值进行比较。 计算的极值余量值小于指定余量值的组中的特征被标记为错误显着。

    SUPPORT VECTOR MACHINE-BASED METHOD FOR ANALYSIS OF SPECTRAL DATA
    5.
    发明申请
    SUPPORT VECTOR MACHINE-BASED METHOD FOR ANALYSIS OF SPECTRAL DATA 失效
    支持向量机分析光谱数据分析方法

    公开(公告)号:US20100205124A1

    公开(公告)日:2010-08-12

    申请号:US12700575

    申请日:2010-02-04

    IPC分类号: G06F15/18

    摘要: Support vector machines are used to classify data contained within a structured dataset such as a plurality of signals generated by a spectral analyzer. The signals are pre-processed to ensure alignment of peaks across the spectra. Similarity measures are constructed to provide a basis for comparison of pairs of samples of the signal. A support vector machine is trained to discriminate between different classes of the samples. to identify the most predictive features within the spectra. In a preferred embodiment feature selection is performed to reduce the number of features that must be considered.

    摘要翻译: 支持向量机用于对包含在结构化数据集中的数据进行分类,例如由频谱分析仪产生的多个信号。 信号被预处理,以确保谱峰的峰对准。 构建相似性度量以提供用于比较信号样本对的基础。 训练支持向量机以区分不同类别的样本。 以识别光谱中最具预测性的特征。 在优选实施例中,执行特征选择以减少必须考虑的特征的数量。

    Support vector machine-based method for analysis of spectral data
    6.
    发明授权
    Support vector machine-based method for analysis of spectral data 失效
    支持向量机分析光谱数据的方法

    公开(公告)号:US08463718B2

    公开(公告)日:2013-06-11

    申请号:US12700575

    申请日:2010-02-04

    IPC分类号: G06F15/18

    摘要: Support vector machines are used to classify data contained within a structured dataset such as a plurality of signals generated by a spectral analyzer. The signals are pre-processed to ensure alignment of peaks across the spectra. Similarity measures are constructed to provide a basis for comparison of pairs of samples of the signal. A support vector machine is trained to discriminate between different classes of the samples. to identify the most predictive features within the spectra. In a preferred embodiment feature selection is performed to reduce the number of features that must be considered.

    摘要翻译: 支持向量机用于对包含在结构化数据集中的数据进行分类,例如由频谱分析仪产生的多个信号。 信号被预处理,以确保谱峰的峰对准。 构建相似性度量以提供用于比较信号样本对的基础。 训练支持向量机以区分不同类别的样本。 以识别光谱中最具预测性的特征。 在优选实施例中,执行特征选择以减少必须考虑的特征的数量。

    METHOD FOR FEATURE SELECTION AND FOR EVALUATING FEATURES IDENTIFIED AS SIGNIFICANT FOR CLASSIFYING DATA
    7.
    发明申请
    METHOD FOR FEATURE SELECTION AND FOR EVALUATING FEATURES IDENTIFIED AS SIGNIFICANT FOR CLASSIFYING DATA 有权
    特征选择和评估对于分类数据有重要意义的特征的方法

    公开(公告)号:US20110078099A1

    公开(公告)日:2011-03-31

    申请号:US12890705

    申请日:2010-09-26

    IPC分类号: G06F15/18

    摘要: A group of features that has been identified as “significant” in being able to separate data into classes is evaluated using a support vector machine which separates the dataset into classes one feature at a time. After separation, an extremal margin value is assigned to each feature based on the distance between the lowest feature value in the first class and the highest feature value in the second class. Separately, extremal margin values are calculated for a normal distribution within a large number of randomly drawn example sets for the two classes to determine the number of examples within the normal distribution that would have a specified extremal margin value. Using p-values calculated for the normal distribution, a desired p-value is selected. The specified extremal margin value corresponding to the selected p-value is compared to the calculated extremal margin values for the group of features. The features in the group that have a calculated extremal margin value less than the specified margin value are labeled as falsely significant.

    摘要翻译: 使用支持向量机将资源分为类别的“特征”组合进行评估,该支持向量机将数据集一次分为一个特征。 分离后,基于第一类中最低特征值与第二类中最高特征值之间的距离,为每个特征分配极值边缘值。 另外,对于两个类别的大量随机绘制的示例集合中的正态分布计算极值边界值,以确定具有指定的极值边界值的正态分布内的示例的数量。 使用为正态分布计算的p值,选择所需的p值。 对应于所选择的p值的指定极值余量值与所计算的特征组的极值边际值进行比较。 计算的极值余量值小于指定余量值的组中的特征被标记为错误显着。

    Recursive feature elimination method using support vector machines

    公开(公告)号:US10402685B2

    公开(公告)日:2019-09-03

    申请号:US12944197

    申请日:2010-11-11

    摘要: Identification of a determinative subset of features from within a group of features is performed by training a support vector machine using training samples with class labels to determine a value of each feature, where features are removed based on their the value. One or more features having the smallest values are removed and an updated kernel matrix is generated using the remaining features. The process is repeated until a predetermined number of features remain which are capable of accurately separating the data into different classes. In some embodiments, features are eliminated by a ranking criterion based on a Lagrange multiplier corresponding to each training sample.

    Selection of features predictive of biological conditions using protein mass spectrographic data
    9.
    发明授权
    Selection of features predictive of biological conditions using protein mass spectrographic data 失效
    使用蛋白质质谱数据选择预测生物条件的特征

    公开(公告)号:US07676442B2

    公开(公告)日:2010-03-09

    申请号:US11929169

    申请日:2007-10-30

    IPC分类号: G06N5/00

    摘要: Support vector machines are used to classify data contained within a structured dataset such as a plurality of signals generated by a spectral analyzer. The signals are pre-processed to ensure alignment of peaks across the spectra. Similarity measures are constructed to provide a basis for comparison of pairs of samples of the signal. A support vector machine is trained to discriminate between different classes of the samples. to identify the most predictive features within the spectra. In a preferred embodiment feature selection is performed to reduce the number of features that must be considered.

    摘要翻译: 支持向量机用于对包含在结构化数据集中的数据进行分类,例如由频谱分析仪产生的多个信号。 信号被预处理,以确保谱峰的峰对准。 构建相似性度量以提供用于比较信号样本对的基础。 训练支持向量机以区分不同类别的样本。 以识别光谱中最具预测性的特征。 在优选实施例中,执行特征选择以减少必须考虑的特征的数量。

    Data mining platform for bioinformatics and other knowledge discovery
    10.
    发明授权
    Data mining platform for bioinformatics and other knowledge discovery 失效
    用于生物信息学和其他知识发现的数据挖掘平台

    公开(公告)号:US07542947B2

    公开(公告)日:2009-06-02

    申请号:US11928641

    申请日:2007-10-30

    IPC分类号: G06F7/00 G06F17/30

    摘要: The data mining platform comprises a plurality of system modules, each formed from a plurality of components. Each module has an input data component, a data analysis engine for processing the input data, an output data component for outputting the results of the data analysis, and a web server to access and monitor the other modules within the unit and to provide communication to other units. Each module processes a different type of data, for example, a first module processes microarray (gene expression) data while a second module processes biomedical literature on the Internet for information supporting relationships between genes and diseases and gene functionality. In the preferred embodiment, the data analysis engine is a kernel-based learning machine, and in particular, one or more support vector machines (SVMs). The data analysis engine includes a pre-processing function for feature selection, for reducing the amount of data to be processed by selecting the optimum number of attributes, or “features”, relevant to the information to be discovered.

    摘要翻译: 数据挖掘平台包括由多个部件形成的多个系统模块。 每个模块具有输入数据组件,用于处理输入数据的数据分析引擎,用于输出数据分析结果的输出数据组件和用于访问和监视该单元内的其它模块的web服务器,并提供通信 其他单位。 每个模块处理不同类型的数据,例如,第一模块处理微阵列(基因表达)数据,而第二模块处理因特网上的生物医学文献以获得支持基因与疾病和基因功能之间关系的信息。 在优选实施例中,数据分析引擎是基于内核的学习机器,特别是一个或多个支持向量机(SVM)。 数据分析引擎包括用于特征选择的预处理功能,用于通过选择与要发现的信息相关的属性的最佳数量或“特征”来减少要处理的数据量。