METHOD AND DEVICE FOR ASSEMBLING GENOME SEQUENCE
    1.
    发明申请
    METHOD AND DEVICE FOR ASSEMBLING GENOME SEQUENCE 审中-公开
    用于组装基因序列的方法和装置

    公开(公告)号:US20130345095A1

    公开(公告)日:2013-12-26

    申请号:US14002374

    申请日:2012-03-02

    IPC分类号: G06F19/18

    CPC分类号: G16B20/00 G16B30/00

    摘要: A method and an apparatus for genome assembly are provided. The method comprises: filtering a short-fragment-sequence output from end sequencing of an large insert-size library to remove unqualified sequence; aligning the filtered short-fragment-sequence onto a reference genome sequence, wherein, the filtered short-fragment-sequences comprise paired short-fragment-sequences; sorting the paired short-fragment-sequence after alignment into soap reads sequence, single reads sequence and unmap reads sequence based on the aligning result, and counting the number of each sort of sequence; calculating a distance between the paired soap reads on a fragment of the reference genome sequence, wherein a pair of the paired soap reads can be aligned onto a same fragment of the reference genome sequence; and counting a distance distribution of each pair of soap reads on the reference genome sequence; and assembling the genome sequence by using the paired single reads upon the distance distribution meeting a requirement of a threshold, wherein a pair of the paired single reads can be aligned onto two different fragments of the reference genome sequence.

    摘要翻译: 提供了用于基因组装配的方法和装置。 该方法包括:从大插入大小库的末端排序中过滤短片段序列输出,以去除不合格序列; 将经过滤的短片段序列对准参考基因组序列,其中,经过滤的短片段序列包含配对的短片段序列; 将对齐后的配对短片段序列排序为soap,读取序列,基于对齐结果的单次读取序列和unmap读取序列,并对每种序列的数量进行计数; 计算参考基因组序列的片段上的成对皂读数之间的距离,其中一对成对的皂读数可以对准参考基因组序列的相同片段; 并计算参考基因组序列上每对皂读数的距离分布; 并且通过在距离分布上使用配对的单个读数来组合基因组序列,满足阈值的要求,其中一对成对的单个读数可以对准参考基因组序列的两个不同片段。