Identification of Co-Regulation Patterns By Unsupervised Cluster Analysis of Gene Expression Data
    1.
    发明申请
    Identification of Co-Regulation Patterns By Unsupervised Cluster Analysis of Gene Expression Data 失效
    通过基因表达数据的无监督聚类分析鉴定协调模式

    公开(公告)号:US20110125683A1

    公开(公告)日:2011-05-26

    申请号:US13019585

    申请日:2011-02-02

    IPC分类号: G06N3/12

    摘要: A method is provided for unsupervised clustering of gene expression data to identify co-regulation patterns. A clustering algorithm randomly divides the data into k different subsets and measures the similarity between pairs of datapoints within the subsets, assigning a score to the pairs based on similarity, with the greatest similarity giving the highest correlation score. A distribution of the scores is plotted for each k. The highest value of k that has a distribution that remains concentrated near the highest correlation score corresponds to the number of co-regulation patterns.

    摘要翻译: 提供了用于基因表达数据的无监督聚类以鉴定共调节模式的方法。 聚类算法将数据随机分为k个不同的子集,并测量子集内的数据点对之间的相似度,并根据相似度为该对分配一个分数,最大相似度给出最高相关分数。 为每个k绘制得分的分布。 具有在最高相关分数附近集中的分布的k的最高值对应于协调模式的数量。

    Model selection for cluster data analysis
    2.
    发明授权
    Model selection for cluster data analysis 失效
    集群数据分析的模型选择

    公开(公告)号:US07890445B2

    公开(公告)日:2011-02-15

    申请号:US11929522

    申请日:2007-10-30

    IPC分类号: G06F17/00 G06N5/00

    摘要: A model selection method is provided for choosing the number of clusters, or more generally the parameters of a clustering algorithm. The algorithm is based on comparing the similarity between pairs of clustering runs on sub-samples or other perturbations of the data. High pairwise similarities show that the clustering represents a stable pattern in the data. The method is applicable to any clustering algorithm, and can also detect lack of structure. We show results on artificial and real data using a hierarchical clustering algorithm.

    摘要翻译: 提供了一种模型选择方法,用于选择聚类数量,或更一般地选择聚类算法的参数。 该算法基于比较子样本上的聚类运行对与数据的其他扰动之间的相似性。 高成对相似性表明聚类表示数据中的稳定模式。 该方法适用于任何聚类算法,并且还可以检测到结构不足。 我们使用层次聚类算法来显示人造和实际数据的结果。

    Identification of co-regulation patterns by unsupervised cluster analysis of gene expression data
    3.
    发明授权
    Identification of co-regulation patterns by unsupervised cluster analysis of gene expression data 失效
    通过基因表达数据的无监督聚类分析鉴定共调控模式

    公开(公告)号:US08489531B2

    公开(公告)日:2013-07-16

    申请号:US13019585

    申请日:2011-02-02

    IPC分类号: G06F17/00 G06N5/00

    摘要: A method is provided for unsupervised clustering of gene expression data to identify co-regulation patterns. A clustering algorithm randomly divides the data into k different subsets and measures the similarity between pairs of datapoints within the subsets, assigning a score to the pairs based on similarity, with the greatest similarity giving the highest correlation score. A distribution of the scores is plotted for each k. The highest value of k that has a distribution that remains concentrated near the highest correlation score corresponds to the number of co-regulation patterns.

    摘要翻译: 提供了用于基因表达数据的无监督聚类以鉴定共调节模式的方法。 聚类算法将数据随机分为k个不同的子集,并测量子集内的数据点对之间的相似度,并根据相似度为该对分配一个分数,最大相似度给出最高相关分数。 为每个k绘制得分的分布。 具有在最高相关分数附近集中的分布的k的最高值对应于协调模式的数量。