Abstract:
A neural network processes sequencing images on a patch-by-patch basis for base calling. The sequencing images depict intensity emissions of a set of analytes. The patches depict the intensity emissions for a subset of the analytes and have undiverse intensity patterns due to limited base diversity. The neural network has convolution filters that have receptive fields confined to the patches. The convolution filters detect intensity patterns in the patches with losses in detection due to the undiverse intensity patterns and confined receptive fields. An intensity contextualization unit determines intensity context data based on intensity values in the images. The data flow logic appends the intensity context data to the sequencing images to generate intensity contextualized images. The neural network applies the convolution filters on the intensity contextualized images and generates base call classifications. The intensity context data in the intensity contextualized images compensates for the losses in detection.
Abstract:
The technology disclosed processes input data through a neural network and produces an alternative representation of the input data. The input data includes per-cycle image data for each of one or more sequencing cycles of a sequencing run. The per-cycle image data depicts intensity emissions of one or more analytes and their surrounding background captured at a respective sequencing cycle. The technology disclosed processes the alternative representation through an output layer and producing an output and base calls one or more of the analytes at one or more of the sequencing cycles based on the output.
Abstract:
Presented herein are transposase enzymes and reaction conditions for improved fragmentation and tagging of nucleic acid samples, in particular altered transposases and reaction conditions which exhibit improved insertion sequence bias, as well as methods and kits using the same.
Abstract:
The technology disclosed uses neural networks to determine analyte metadata by (i) processing input image data derived from a sequence of image sets through a neural network and generating an alternative representation of the input image data, the input image data has an array of units that depicts analytes and their surrounding background, (ii) processing the alternative representation through an output layer and generating an output value for each unit in the array, (iii) thresholding output values of the units and classifying a first subset of the units as background units depicting the surrounding background, and (iv) locating peaks in the output values of the units and classifying a second subset of the units as center units containing centers of the analytes.
Abstract:
The technology disclosed processes a first input through a first neural network and produces a first output. The first input comprises first image data derived from images of analytes and their surrounding background captured by a sequencing system for a sequencing run. The technology disclosed processes the first output through a post-processor and produces metadata about the analytes and their surrounding background. The technology disclosed processes a second input through a second neural network and produces a second output. The second input comprises third image data derived by modifying second image data based on the metadata. The second image data is derived from the images of the analytes and their surrounding background. The second output identifies base calls for one or more of the analytes at one or more sequencing cycles of the sequencing run.
Abstract:
The technology disclosed presents a deep learning-based framework, which identifies sequence patterns that cause sequence-specific errors (SSEs). Systems and methods train a variant filter on large-scale variant data to learn causal dependencies between sequence patterns and false variant calls. The variant filter has a hierarchical structure built on deep neural networks such as convolutional neural networks and fully-connected neural networks. Systems and methods implement a simulation that uses the variant filter to test known sequence patterns for their effect on variant filtering. The premise of the simulation is as follows: when a pair of a repeat pattern under test and a called variant is fed to the variant filter as part of a simulated input sequence and the variant filter classifies the called variant as a false variant call, then the repeat pattern is considered to have caused the false variant call and identified as SSE-causing.
Abstract:
The technology disclosed compresses a larger, teacher base caller into a smaller, student base caller. The student base caller has fewer processing modules and parameters than the teacher base caller. The teacher base caller is trained using hard labels (e.g., one-hot encodings). The trained teacher base caller is used to generate soft labels as output probabilities during the inference phase. The soft labels are used to train the student base caller.
Abstract:
Devices, systems, and methods for non-volatile storage include a well activation device operable to modify one or more wells from a plurality of wells of a flow cell to provide a set of readable wells. Readable wells are configured to allow exposure of a well to substances from nucleotide sequencing fluids, and prevent exposure to other substances and fluids, such as nucleotide synthesizing fluids. The well activation device may also modify wells to provide a set of writeable wells. This set of wells is configured to allow exposure to the nucleotide synthesizing fluids and substances; and prevent exposure to the nucleotide sequencing fluids and substances. There may also be provisions made for risk mitigation for data errors such as generating commands to write specified data to a nucleotide sequence associated with a particular location in a storage device, reading the nucleotide sequence and performing a comparison.
Abstract:
A method includes grafting oligonucleotides to a flow cell and preparing a library of polynucleotides. Each polynucleotide has been written to contain retrievable information and includes a region complementary to one of the sequencing initiation primers grafted to the flow cell. Each polynucleotide is indexed to permit discrete identification of that polynucleotide and the information it contains over other polynucleotides in the library. Another method includes writing two polynucleotides including two sequences with reverse complementary joining sequences onto a flow cell. One of the polynucleotides is extended to generate a third polynucleotide comprising a sequence that is the combination of the first and second sequences. A fourth polynucleotide is written with a third joining sequence of a fourth sequence. The third joining sequence is a reverse complement of a portion of the third polynucleotide comprising the third sequence and forming a second joining bridge between the third and fourth polynucleotides.
Abstract:
Presented herein are methods and compositions for tagmentation of nucleic acids. The methods are useful for generating tagged DNA fragments that are qualitatively and quantitatively representative of the target nucleic acids in the sample from which they are generated.