Abstract:
Systems and method for determining variants can receive mapped reads, and call variants. In embodiments, flow space information for the reads can be aligned to a flow space representation of a corresponding portion of the reference. Reads spanning a position with a potential variant can be grouped and a score can be calculated for the variant. Based on the scores, a list of probable variants can be provided. In various embodiments, low frequency variants can be identified where multiple potential variants are present at a position.
Abstract:
A targeted panel with low sample input requirements from a tumor only sample may be processed to estimate mutation load in a tumor sample. The method may include detecting variants in nucleic acid sequence reads corresponding to targeted locations in the tumor sample genome; annotating detected variants with an annotation information from a population database; filtering the detected variants, wherein the filtering rule set retains the somatic variants and removes germ-line variants; counting the identified somatic variants to give a number of somatic variants; determining a number of bases in covered regions of the targeted locations in the tumor sample genome; and calculating a number of somatic variants per megabase, provides an estimate of the mutation load per megabase in the tumor sample genome.
Abstract:
Systems and methods for analyzing overlapping sequence information can obtain first and second overlapping sequence information for a polynucleotide, align the first and second sequence information, determine a degree of agreement between the first and second sequence information for a location along the polynucleotide, and determine a base call and a quality value for the location.
Abstract:
Systems, methods, and computer program products for aligning a fragment sequence to a target sequencing. The alignment is allowed at most one gap, such as an insertion or a deletion. In some embodiments, both a gapped alignment and an ungapped alignment can be produced. A selection can be made between the gapped alignment and the ungapped alignment based on a quality value for each alignment.
Abstract:
Systems and method for determining variants can receive mapped reads, align flow space information to a flow space representation of a corresponding portion of the reference. Reads spanning a position with a potential variant can be evaluated in a context specific manner. A list of probable variants can be provided.
Abstract:
A computer-implemented method for classifying alignments of paired nucleic acid sequence reads is disclosed. A plurality of paired nucleic acid sequence reads is received, wherein each read is comprised of a first tag and a second tag separated by an insert region. Potential alignments for the first and second tags of each read to a reference sequence is determined, wherein the potential alignments satisfies a minimum threshold mismatch constraint. Potential paired alignments of the first and second tags of each read are identified, wherein a distance between the first and second tags of each potential paired alignment is within an estimated insert size range. An alignment score is calculated for each potential paired alignment based on a distance between the first and second tags and a total number of mismatches for each tag.
Abstract:
A computer-implemented method for classifying alignments of paired nucleic acid sequence reads is disclosed. A plurality of paired nucleic acid sequence reads is received, wherein each read is comprised of a first tag and a second tag separated by an insert region. Potential alignments for the first and second tags of each read to a reference sequence is determined, wherein the potential alignments satisfies a minimum threshold mismatch constraint. Potential paired alignments of the first and second tags of each read are identified, wherein a distance between the first and second tags of each potential paired alignment is within an estimated insert size range. An alignment score is calculated for each potential paired alignment based on a distance between the first and second tags and a total number of mismatches for each tag.
Abstract:
Systems and method for identifying somatic mutations can receive first and second sequence information, determine if a variant present in the first sequencing information is also present in the second sequence information, and identify variants present in the first sequence information are somatic mutations when the variant is either not present in the second sequence information or the presence of the variant in the second sequence information is likely due to a sequencing error.
Abstract:
A method for detecting a gene fusion includes amplifying a nucleic acid sample in the presence of primer pool to produce a plurality of amplicons. The primer pool includes primers targeting a plurality of exon-exon junctions of a driver gene. The amplicons correspond to the exon-exon junctions. The amplicons are sequenced and aligned to a reference sequence. The number of reads corresponding to each amplicon is normalized to give a normalized read count. A baseline correction is applied to the normalized read counts for the amplicons to form corrected read counts. A binary segmentation score is calculated for each corrected read count. A predicted breakpoint for the gene fusion is determined based on the amplicon index corresponding to the maximum absolute binary segmentation score. Gene fusion events may be detected in a partner agnostic manner, i.e. without prior knowledge of the specific fusion partner genes or specific breakpoint information.
Abstract:
Systems and method for identifying variants associated with a genetic disease can include obtaining calls for a plurality of individuals for a list of variant positions. The calls can be compared to identify variants that are found in affected individuals and absent in non-affected individuals. Such variants can include loss of heterozygosity, trans-phased compound heterozygotes, increased frequency mitochondrial variants, homozygous recessive variants, de novo variants, sex-linked variants, and combinations thereof.