Methods and systems for assembly of protein sequences

    公开(公告)号:US10309968B2

    公开(公告)日:2019-06-04

    申请号:US15599431

    申请日:2017-05-18

    摘要: Methods and systems for determining amino acid sequence of a polypeptide or protein from mass spectrometry data is provided, using a weighted de Bruijn graph. Extracted and purified protein is cleaved into a mixture of peptide and then analyzed using mass spectrometry. A list of peptide sequences is derived from mass spectrometry fragment data by de novo sequencing, and amino acid confidence scores are determined from peak fragment ion intensity. A weighted de Bruijn graph is constructed for the list of peptide sequences having node weights defined by k−1 mer confidence scores. At least one contig is assembled from the de Bruijn graph by identifying node weights having the highest k−1 mer confidence scores.