-
公开(公告)号:US20110124514A1
公开(公告)日:2011-05-26
申请号:US12051765
申请日:2008-03-19
申请人: Carol E. Zhou , Adam T. Zemla , Marisa W. Lam , Jason R. Smith , Elizabeth A. Vitalis , Shea N. Gardner , Thomas A. Kuczmarski , Thomas R. Slezak , Diane C. Roe , Joseph P. Schoeniger , Clinton L. Torres
发明人: Carol E. Zhou , Adam T. Zemla , Marisa W. Lam , Jason R. Smith , Elizabeth A. Vitalis , Shea N. Gardner , Thomas A. Kuczmarski , Thomas R. Slezak , Diane C. Roe , Joseph P. Schoeniger , Clinton L. Torres
CPC分类号: G16B20/00
摘要: A set of known protein sequences associated with an organism is identified, wherein each known protein sequence comprises a plurality of ordered residues. A set of scores associated with a set of residues of the plurality of ordered residues is identified, wherein each score indicates a frequency of a residue in sequence context. A set of unique sub-sequences of the set of known protein sequences is identified. A plurality of protein signature residues is determined based on the set of scores associated with the set of residues and the set of unique sub-sequences.
摘要翻译: 鉴定与生物体相关的一组已知蛋白质序列,其中每个已知的蛋白质序列包含多个有序残基。 识别与多个有序残基的一组残基相关联的分数集合,其中每个分数指示序列上下文中残基的频率。 鉴定了该组已知蛋白质序列的一组独特子序列。 基于与残差集合和唯一子序列集相关联的分数集合来确定多个蛋白质特征残基。