Abstract:
An ultra-fast solution to the problem of comparing genomes across sequencing technologies and genome freezes, while preserving privacy, is presented. A method for transforming a standard genome representation (i.e., a list of variants relative to a reference) into a "fingerprint" of the genome does not require knowledge of the technology, reference and encoding used, and yields fingerprints that can be readily compared to ascertain relatedness between two genome representations. Because of their reduced size, computation on the genome fingerprints is fast and requires little memory. This enables scaling up a variety of important genome analyses, including determinations of degree of relatedness, recognizing duplicative sequenced genomes in a set, and many others. Because the original genome representation cannot be reconstructed from its fingerprint, the method also has significant implications for privacy-preserving genome analytics.
Abstract:
A system for variant call processing review is disclosed. The system can render a call review interface on a display, the call review interface including a plurality of rows, wherein each row represents a call variant read. The system can receive a user input associated with a user-provided command for interacting with a displayed call variant read. The system can store information associated with the user-provided command, the stored information including at least one of a call identification, a sample identification, a displayed call variant read, and a call override value. The system can render a confirmation interface on the display, wherein rendering the confirmation interface includes displaying the stored information.
Abstract:
Embodiments relate to methods and systems for analyzing genomic data, such as genetic variants. Some embodiments relate to the efficient analysis and presentation of certain genetic variants of an individual.
Abstract:
The present invention is directed to a system and methods of predicting protein function through a process of encoding protein structural information into a computer readable format and the use of a convolutional neural network designed to recognize such encoded format.
Abstract:
A method (200) for evaluating nucleic acid sequencing data using a quality control analysis system (300), comprising: receiving (210) a plurality of reads of a nucleic acid sequence; extracting (220) a plurality of k-mers from the plurality of reads; identifying (230), using the plurality of extracted k-mers, one or more of a plurality of annotated k-mers found in the plurality of reads, wherein the plurality of extracted k-mers are stored in an annotation database (350), and further wherein the annotated k-mers are annotated with annotation information about the one or more nucleic acid sequences from which the annotated k-mers are generated; gathering (240), based on the identified annotated k-mers found in the plurality of reads, annotation information about the plurality of reads; and determining (250), based on the gathered annotation information, a quality control metric for at least some of the plurality of reads.
Abstract:
A method for providing a computer implemented medical diagnosis, the method comprising: receiving an input from a user comprising at least one symptom of the user; providing the at least one symptom as an input to a medical model comprising: a probabilistic graphical model comprising probability distributions and relationships between symptoms and diseases; an inference engine configured to perform Bayesian inference on said probabilistic graphical model; and a discriminative model pre-trained to approximate the probabilistic graphical model, the discriminative model being trained using samples from said probabilistic graphical model, wherein some of the data of the samples has been masked to allow the deterministic model to produce data which is robust to the user providing incomplete information about their symptoms; deriving estimates of the probability of the user having a disease from the discriminative model; inputting the estimates to the inference engine; performing approximate inference on the probabilistic graphical model to obtain a prediction of the probability that the user has that disease; and outputting the probability of the user having the disease for display by a display device.
Abstract:
This disclosure concerns methods for evaluating inflammatory cells and modulators of the inflammatory response in tumor tissue and other relevant tissue types. The methods entail: obtaining a tissue sample and processing said tissue sample to produce histologic slides of tissue sections; staining of the tissue sections to identify inflammatory cells and modulators of the inflammatory response; digitizing slides to produce an image of the stained tissue sections; digitally stratifying the tissue sample into tumor and other relevant tissue compartments; and using digital image analysis to quantify cell-based and cell population-based features. The quantification of cell-based and cell population-based features within a tissue compartment of interest is used to develop a summary score of the immune system-tissue compartment of interest interaction. Patient stratification and selection as candidates for a therapeutic approach is ultimately based on the summary score value.