-
公开(公告)号:US20130345095A1
公开(公告)日:2013-12-26
申请号:US14002374
申请日:2012-03-02
申请人: Changlei Han , Wenbin Chen , Xiuqing Zhang , Huanming Yang
发明人: Changlei Han , Wenbin Chen , Xiuqing Zhang , Huanming Yang
IPC分类号: G06F19/18
摘要: A method and an apparatus for genome assembly are provided. The method comprises: filtering a short-fragment-sequence output from end sequencing of an large insert-size library to remove unqualified sequence; aligning the filtered short-fragment-sequence onto a reference genome sequence, wherein, the filtered short-fragment-sequences comprise paired short-fragment-sequences; sorting the paired short-fragment-sequence after alignment into soap reads sequence, single reads sequence and unmap reads sequence based on the aligning result, and counting the number of each sort of sequence; calculating a distance between the paired soap reads on a fragment of the reference genome sequence, wherein a pair of the paired soap reads can be aligned onto a same fragment of the reference genome sequence; and counting a distance distribution of each pair of soap reads on the reference genome sequence; and assembling the genome sequence by using the paired single reads upon the distance distribution meeting a requirement of a threshold, wherein a pair of the paired single reads can be aligned onto two different fragments of the reference genome sequence.
摘要翻译: 提供了用于基因组装配的方法和装置。 该方法包括:从大插入大小库的末端排序中过滤短片段序列输出,以去除不合格序列; 将经过滤的短片段序列对准参考基因组序列,其中,经过滤的短片段序列包含配对的短片段序列; 将对齐后的配对短片段序列排序为soap,读取序列,基于对齐结果的单次读取序列和unmap读取序列,并对每种序列的数量进行计数; 计算参考基因组序列的片段上的成对皂读数之间的距离,其中一对成对的皂读数可以对准参考基因组序列的相同片段; 并计算参考基因组序列上每对皂读数的距离分布; 并且通过在距离分布上使用配对的单个读数来组合基因组序列,满足阈值的要求,其中一对成对的单个读数可以对准参考基因组序列的两个不同片段。