Method for the manipulation, storage, modeling, visualization and quantification of datasets
    31.
    发明授权
    Method for the manipulation, storage, modeling, visualization and quantification of datasets 失效
    数据集的操作,存储,建模,可视化和量化方法

    公开(公告)号:US06920451B2

    公开(公告)日:2005-07-19

    申请号:US09766247

    申请日:2001-01-19

    申请人: Sandy C. Shaw

    发明人: Sandy C. Shaw

    IPC分类号: G06F17/30

    摘要: There is described a method for manipulation, storage, modeling, visualization, and quantification of datasets, which correspond to target strings. A number of target strings are provided. An iterative algorithm is used to generate comparison strings corresponding to some set of points that can serve as the domain of an iterative function. Preferably these points are located in the complex plane, such as in and/or near the Mandelbrot Set or a Julia Set. These comparison strings are also datasets. The comparison string is scored by evaluating a function having the comparison string and one of the plurality of target strings as inputs. The score measures a relationship between a comparison string and a target string. The evaluation may be repeated for a number of the other target strings. The score or some other property corresponding to the comparison string is used to determine the target string's placement on a map. The target string may also be marked by a point on a visual display. The coordinates of the point corresponding to the target string or properties of the comparison string may be stored in memory, a database or a table. Mapped or marked points in a region of interest can be explored by examining a subregion with higher resolution. The points are analyzed and/or compared by examining, either visually or mathematically, their relative locations, their absolute locations within the region, and/or metrics other than location.

    摘要翻译: 描述了对应于目标字符串的数据集的操纵,存储,建模,可视化和量化的方法。 提供了许多目标字符串。 使用迭代算法来生成对应于可以用作迭代函数的域的某些点的比较串。 优选地,这些点位于复合平面中,例如在Mandelbrot集合或Julia集合中和/或附近。 这些比较字符串也是数据集。 通过评估具有比较串和多个目标串中的一个作为输入的功能来对比较串进行评分。 分数测量比较字符串和目标字符串之间的关系。 对于多个其他目标串可以重复评估。 对应于比较字符串的得分或其他属性用于确定目标字符串在地图上的位置。 目标字符串也可以由可视显示器上的点标记。 对应于目标字符串的点的坐标或比较串的属性可以存储在存储器,数据库或表中。 可以通过检查具有更高分辨率的子区域来探索目标区域中的映射或标记点。 通过在视觉上或数学上检查其相对位置,其在该区域内的绝对位置和/或位置以外的度量来分析和/或比较要点。

    Method and system for analysis of flow cytometry data using support vector machines
    33.
    发明授权
    Method and system for analysis of flow cytometry data using support vector machines 有权
    使用支持向量机分析流式细胞术数据的方法和系统

    公开(公告)号:US08682810B2

    公开(公告)日:2014-03-25

    申请号:US12367541

    申请日:2009-02-08

    申请人: Hong Zhang

    发明人: Hong Zhang

    IPC分类号: G06F15/18

    CPC分类号: G06K9/00147

    摘要: An automated method and system are provided for receiving an input of flow cytometry data and analyzing the data using one or more support vector machines to generate an output in which the flow cytometry data is classified into two or more categories. The one or more support vector machines utilize a kernel that captures distributional data within the input data. Such a distributional kernel is constructed by using a distance function (divergence) between two distributions. In the preferred embodiment, a kernel based upon the Bhattacharyya affinity is used. The distributional kernel is applied to classification of flow cytometry data obtained from patients suspected having myelodysplastic syndrome.

    摘要翻译: 提供了一种自动化方法和系统,用于接收流式细胞术数据的输入并使用一个或多个支持向量机分析数据,以产生其中流式细胞术数据被分类为两个或多个类别的输出。 一个或多个支持向量机利用捕获输入数据内的分布数据的内核。 通过使用两个分布之间的距离函数(发散)来构造这样的分布核。 在优选实施例中,使用基于Bhattacharyya亲和度的内核。 分布核应用于从疑似有骨髓增生异常综合征的患者获得的流式细胞术数据分类。

    BIOMARKERS DOWNREGULATED IN PROSTATE CANCER
    34.
    发明申请
    BIOMARKERS DOWNREGULATED IN PROSTATE CANCER 有权
    生物标志物在前列腺癌中下降

    公开(公告)号:US20110312509A1

    公开(公告)日:2011-12-22

    申请号:US13220082

    申请日:2011-08-29

    申请人: Isabelle Guyon

    发明人: Isabelle Guyon

    IPC分类号: C12Q1/68 C40B30/00

    摘要: Biomarkers are identified by analyzing gene expression data using support vector machines (SVM), recursive feature elimination (RFE) and/or linear ridge regression classifiers to rank genes according to their ability to separate prostate cancer from normal tissue. Proteins expressed by identified genes are detected in patient samples to screen, predict and monitor prostate cancer.

    摘要翻译: 通过使用支持向量机(SVM),递归特征消除(RFE)和/或线性脊回归分类器分析基因表达数据来鉴定生物标志物,以根据将前列腺癌与正常组织分离的能力对基因进行排序。 在患者样本中检测到由鉴定的基因表达的蛋白质以筛选,预测和监测前列腺癌。

    METHOD FOR FEATURE SELECTION AND FOR EVALUATING FEATURES IDENTIFIED AS SIGNIFICANT FOR CLASSIFYING DATA
    35.
    发明申请
    METHOD FOR FEATURE SELECTION AND FOR EVALUATING FEATURES IDENTIFIED AS SIGNIFICANT FOR CLASSIFYING DATA 有权
    特征选择和评估对于分类数据有重要意义的特征的方法

    公开(公告)号:US20110078099A1

    公开(公告)日:2011-03-31

    申请号:US12890705

    申请日:2010-09-26

    IPC分类号: G06F15/18

    摘要: A group of features that has been identified as “significant” in being able to separate data into classes is evaluated using a support vector machine which separates the dataset into classes one feature at a time. After separation, an extremal margin value is assigned to each feature based on the distance between the lowest feature value in the first class and the highest feature value in the second class. Separately, extremal margin values are calculated for a normal distribution within a large number of randomly drawn example sets for the two classes to determine the number of examples within the normal distribution that would have a specified extremal margin value. Using p-values calculated for the normal distribution, a desired p-value is selected. The specified extremal margin value corresponding to the selected p-value is compared to the calculated extremal margin values for the group of features. The features in the group that have a calculated extremal margin value less than the specified margin value are labeled as falsely significant.

    摘要翻译: 使用支持向量机将资源分为类别的“特征”组合进行评估,该支持向量机将数据集一次分为一个特征。 分离后,基于第一类中最低特征值与第二类中最高特征值之间的距离,为每个特征分配极值边缘值。 另外,对于两个类别的大量随机绘制的示例集合中的正态分布计算极值边界值,以确定具有指定的极值边界值的正态分布内的示例的数量。 使用为正态分布计算的p值,选择所需的p值。 对应于所选择的p值的指定极值余量值与所计算的特征组的极值边际值进行比较。 计算的极值余量值小于指定余量值的组中的特征被标记为错误显着。

    Method for feature selection in a support vector machine using feature ranking
    36.
    发明授权
    Method for feature selection in a support vector machine using feature ranking 失效
    使用特征排序的支持向量机中特征选择的方法

    公开(公告)号:US07805388B2

    公开(公告)日:2010-09-28

    申请号:US11928784

    申请日:2007-10-30

    IPC分类号: G06N7/00

    摘要: In a pre-processing step prior to training a learning machine, pre-processing includes reducing the quantity of features to be processed using feature selection methods selected from the group consisting of recursive feature elimination (RFE), minimizing the number of non-zero parameters of the system (l0-norm minimization), evaluation of cost function to identify a subset of features that are compatible with constraints imposed by the learning set, unbalanced correlation score, transductive feature selection and single feature using margin-based ranking. The features remaining after feature selection are then used to train a learning machine for purposes of pattern classification, regression, clustering and/or novelty detection.

    摘要翻译: 在训练学习机之前的预处理步骤中,预处理包括使用从递归特征消除(RFE)中选出的特征选择方法来减少要处理的特征量的数量,使非零参数的数量最小化 (10-norm minimization),评估成本函数以识别与由学习集施加的约束兼容的特征的子集,不平衡相关得分,转换特征选择和使用基于边缘的排名的单个特征。 然后,特征选择之后剩余的特征用于训练学习机,用于模式分类,回归,聚类和/或新颖性检测。

    BIOMARKERS OVEREXPRESSED IN PROSTATE CANCER
    37.
    发明申请
    BIOMARKERS OVEREXPRESSED IN PROSTATE CANCER 审中-公开
    生物标志物在前列腺癌中过度表达

    公开(公告)号:US20090286240A1

    公开(公告)日:2009-11-19

    申请号:US12242264

    申请日:2008-09-30

    申请人: Isabelle Guyon

    发明人: Isabelle Guyon

    IPC分类号: C12Q1/68 C07H21/04

    摘要: Biomarkers are identified by analyzing gene expression data using support vector machines (SVM) to rank genes according to their ability to separate prostate cancer from normal tissue. Proteins expressed by identified genes are detected in patient samples to screen, predict and monitor prostate cancer.

    摘要翻译: 通过使用支持向量机(SVM)分析基因表达数据来鉴定生物标志物,以根据将前列腺癌与正常组织分离的能力对基因进行排名。 在患者样本中检测到由鉴定的基因表达的蛋白质以筛选,预测和监测前列腺癌。

    Pre-processed feature ranking for a support vector machine
    38.
    发明授权
    Pre-processed feature ranking for a support vector machine 失效
    支持向量机的预处理功能排名

    公开(公告)号:US07475048B2

    公开(公告)日:2009-01-06

    申请号:US10494876

    申请日:2002-11-07

    IPC分类号: G06F15/18

    摘要: A computer-implemented method is provided for ranking features within a large dataset containing a large number of features according to each feature's ability to separate data into classes. For each feature, a support vector machine separates the dataset into two classes and determines the margins between extremal points in the two classes. The margins for all of the features are compared and the features are ranked based upon the size of the margin, with the highest ranked features corresponding to the largest margins. A subset of features for classifying the dataset is selected from a group of the highest ranked features. In one embodiment, the method is used to identify the best genes for disease prediction and diagnosis using gene expression data from micro-arrays.

    摘要翻译: 提供了一种计算机实现的方法,用于根据每个特征将数据分离成类的能力,对包含大量特征的大型数据集中的特征进行排名。 对于每个特征,支持向量机将数据集分为两类,并确定两类极值点之间的边距。 比较所有功能的边距,并根据边距的大小对特征进行排名,排名最高的功能对应于最大的边距。 从一组最高排名的特征中选择用于分类数据集的特征的子集。 在一个实施方案中,该方法用于使用来自微阵列的基因表达数据鉴定用于疾病预测和诊断的最佳基因。

    Method of using kernel alignment to extract significant features from a large dataset
    39.
    发明授权
    Method of using kernel alignment to extract significant features from a large dataset 有权
    使用内核对齐从大型数据集中提取重要特征的方法

    公开(公告)号:US07299213B2

    公开(公告)日:2007-11-20

    申请号:US11225251

    申请日:2005-09-12

    申请人: Nello Cristianini

    发明人: Nello Cristianini

    IPC分类号: G06F15/18

    摘要: The spectral kernel machine combines kernel functions and spectral graph theory for solving problems of machine learning. The data points in the dataset are placed in the form of a matrix known as a kernel matrix, or Gram matrix, containing all pairwise kernels between the data points. The dataset is regarded as nodes of a fully connected graph. A weight equal to the kernel between the two nodes is assigned to each edge of the graph. The adjacency matrix of the graph is equivalent to the kernel matrix, also known as the Gram matrix. The eigenvectors and their corresponding eigenvalues provide information about the properties of the graph, and thus, the dataset. The second eigenvector can be thresholded to approximate the class assignment of graph nodes. Eigenvectors of the kernel matrix may be used to assign unlabeled data to clusters, merge information from labeled and unlabeled data by transduction, provide model selection information for other kernels, detect novelties or anomalies and/or clean data, and perform supervised learning tasks such as classification.

    摘要翻译: 光谱核机器结合核函数和光谱图理论来解决机器学习的问题。 数据集中的数据点以被称为内核矩阵的矩阵形式,或者在数据点之间包含所有成对内核的Gram矩阵的形式。 数据集被视为完全连接图的节点。 将等于两个节点之间的内核的权重分配给图形的每个边。 图的相邻矩阵等效于内核矩阵,也称为Gram矩阵。 特征向量及其对应的特征值提供关于图的属性的信息,并因此提供关于数据集的信息。 第二特征向量可以被阈值化以近似图形节点的类分配。 内核矩阵的特征向量可以用于将未标记的数据分配给群集,通过转换合并来自标记和未标记数据的信息,为其他内核提供模型选择信息,检测新奇或异常和/或清理数据,以及执行监督学习任务,例如 分类。

    Computer-aided image analysis
    40.
    发明授权

    公开(公告)号:US06996549B2

    公开(公告)日:2006-02-07

    申请号:US10056438

    申请日:2002-01-23

    IPC分类号: G06F15/18

    摘要: Digitized image data are input into a processor where a detection component identifies the areas (objects) of particular interest in the image and, by segmentation, separates those objects from the background. A feature extraction component formulates numerical values relevant to the classification task from the segmented objects. Results of the preceding analysis steps are input into a trained learning machine classifier which produces an output which may consist of an index discriminating between two possible diagnoses, or some other output in the desired output format. In one embodiment, digitized image data are input into a plurality of subsystems, each subsystem having one or more support vector machines. Pre-processing may include the use of known transformations which facilitate extraction of the useful data. Each subsystem analyzes the data relevant to a different feature or characteristic found within the image. Once each subsystem completes its analysis and classification, the output for all subsystems is input into an overall support vector machine analyzer which combines the data to make a diagnosis, decision or other action which utilizes the knowledge obtained from the image.