Abstract:
Systems and methods for analyzing first and second strings against a ground truth string are provided. A construct representing a plurality of components is obtained, each component for a different portion of the truth string. The construct comprises a plurality of measurement string sampling pools each having an identifier and a corresponding plurality of measurement samplings corresponding to one or two of the components. Each sampling has the identifier and a portion of the first or second string. Samplings are assigned to first, second or third classes when coding a portion of the first string, second string, or both the first and second string. First and second positions are tested for sequence events by calculating a plurality of sequence event models using assumptions on the components having samplings encompassing the first and second positions and class assignments. These assumptions are updated using the calculated models and the models are recalculated.
Abstract:
The present disclosure relates to methods, compositions and systems for haplotype phasing and copy number variation assays. Included within this disclosure are methods and systems for combining the barcode comprising beads with samples in multiple separate partitions, as well as methods of processing, sequencing and analyzing barcoded samples.
Abstract:
Systems and methods for determining structural variation and phasing using variant call data obtained from nucleic acid of a biological sample are provided. Sequence reads are obtained, each comprising a portion corresponding to a subset of the test nucleic acid and a portion encoding a barcode independent of the sequencing data. Bin information is obtained. Each bin represents a different portion of the sample nucleic acid. Each bin corresponds to a set of sequence reads in a plurality of sets of sequence reads formed from the sequence reads such that each sequence read in a respective set of sequence reads corresponds to a subset of the nucleic acid represented by the bin corresponding to the respective set. Binomial tests identify bin pairs having more sequence reads with the same barcode in common than expected by chance. Probabilistic models determine structural variation likelihood from the sequence reads of these bin pairs.