-
公开(公告)号:US20170336419A1
公开(公告)日:2017-11-23
申请号:US15599431
申请日:2017-05-18
Applicant: BIOINFORMATICS SOLUTIONS INC.
Inventor: Ngoc Hieu TRAN , Mohammad Ziaur RAHMAN , Lin HE , Lei XIN , Baozhen SHAN , Ming LI
CPC classification number: G01N33/6818 , G01N33/6848 , G01N33/6854 , G01N2560/00 , G06F19/22 , G06F19/24 , G06F19/26
Abstract: Methods and systems for determining amino acid sequence of a polypeptide or protein from mass spectrometry data is provided, using a weighted de Bruijn graph. Extracted and purified protein is cleaved into a mixture of peptide and then analyzed using mass spectrometry. A list of peptide sequences is derived from mass spectrometry fragment data by de novo sequencing, and amino acid confidence scores are determined from peak fragment ion intensity. A weighted de Bruijn graph is constructed for the list of peptide sequences having node weights defined by k−1 mer confidence scores. At least one contig is assembled from the de Bruijn graph by identifying node weights having the highest k-1 mer confidence scores.