发明申请
- 专利标题: METHOD AND DEVICE FOR ASSEMBLING GENOME SEQUENCE
- 专利标题(中): 用于组装基因序列的方法和装置
-
申请号: US14002374申请日: 2012-03-02
-
公开(公告)号: US20130345095A1公开(公告)日: 2013-12-26
- 发明人: Changlei Han , Wenbin Chen , Xiuqing Zhang , Huanming Yang
- 申请人: Changlei Han , Wenbin Chen , Xiuqing Zhang , Huanming Yang
- 申请人地址: CN Shenzhen
- 专利权人: BGI TECH SOLUTIONS CO., LTD.
- 当前专利权人: BGI TECH SOLUTIONS CO., LTD.
- 当前专利权人地址: CN Shenzhen
- 优先权: CN201110019885.0 20110302
- 国际申请: PCT/CN2012/071876 WO 20120302
- 主分类号: G06F19/18
- IPC分类号: G06F19/18
摘要:
A method and an apparatus for genome assembly are provided. The method comprises: filtering a short-fragment-sequence output from end sequencing of an large insert-size library to remove unqualified sequence; aligning the filtered short-fragment-sequence onto a reference genome sequence, wherein, the filtered short-fragment-sequences comprise paired short-fragment-sequences; sorting the paired short-fragment-sequence after alignment into soap reads sequence, single reads sequence and unmap reads sequence based on the aligning result, and counting the number of each sort of sequence; calculating a distance between the paired soap reads on a fragment of the reference genome sequence, wherein a pair of the paired soap reads can be aligned onto a same fragment of the reference genome sequence; and counting a distance distribution of each pair of soap reads on the reference genome sequence; and assembling the genome sequence by using the paired single reads upon the distance distribution meeting a requirement of a threshold, wherein a pair of the paired single reads can be aligned onto two different fragments of the reference genome sequence.
信息查询