Abstract:
Nucleic acid sequence mapping/assembly methods are disclosed. The methods initially map only a contiguous portion of each read to a reference sequence and then extends the mapping of the read at both ends of the mapped contiguous portion until the entire read is mapped (aligned). In various embodiments, a mapping score can be calculated for the read alignment using a scoring function, score (i, j)=M+mx, where M can be the number of matches in the extended alignment, x can be the number of mismatches in the alignment, and m can be a negative penalty for each mismatch. The mapping score can be utilized to rank or choose the best alignment for each read.
Abstract:
Disclosed are systems and methods for resequencing using color calls. A DNA sample is encoded and sequenced according to a multi-base code producing a string of read color calls for a fragment of the sample. A reference sequence is obtained. The string of read color calls is mapped to the reference sequence. A base sequence is extracted from the reference sequence. The base sequence is encoded as a string of reference color codes according to the multi-base code. The string of read color calls is aligned with the string of reference color codes and mismatches in the alignment are detected. One or more mismatches of the string of read color calls are annotated as inconsistent. The one or more inconsistent mismatches of the string of read color calls are corrected. The string of corrected read color calls is decoded to bases producing a read sequence.