Methods of Clustering Gene and Protein Sequences
    1.
    发明申请
    Methods of Clustering Gene and Protein Sequences 审中-公开
    聚类基因和蛋白质序列的方法

    公开(公告)号:US20090327170A1

    公开(公告)日:2009-12-31

    申请号:US12086717

    申请日:2006-12-19

    摘要: The invention relates to methods for clustering gene and protein sequences. In particular, it involves generation of networks of sequences where the interconnections are based upon a measure of similarity. The invention also provides methods of optimizing and improving the networks by re-wiring of the network based upon overlap of the nearest neighbors of given pairs of nodes. The invention further provides methods of identifying clusters of sequences within the networks and the optimized networks based upon the topology of the network. The clusters identified represent groups of sequences that are related by function and/or evolution. The invention has particular applicability in annotation of sequences in databases and identification of functional homologs which can be very useful for novel therapeutic and diagnostic targets based upon such targets belonging to a cluster or family that contains a known sequence such as a diagnostic sequence, antigen or other therapeutic target.

    摘要翻译: 本发明涉及聚类基因和蛋白质序列的方法。 特别地,它涉及产生其中互连基于相似性度量的序列网络。 本发明还提供了通过基于给定的节点对的最近邻的重叠来重新布线网络来优化和改进网络的方法。 本发明还提供了基于网络的拓扑来识别网络内的序列簇和优化的网络的方法。 识别的簇表示与功能和/或进化相关的序列组。 本发明特别适用于数据库中的序列注释和功能同系物的鉴定,其可以对基于属于簇或家族的这样的靶标的新型治疗和诊断靶标非常有用,所述靶标包含已知序列,例如诊断序列,抗原或 其他治疗目标。