Systems and methods for adaptive local alignment for graph genomes

    公开(公告)号:US11810648B2

    公开(公告)日:2023-11-07

    申请号:US16663243

    申请日:2019-10-24

    CPC分类号: G16B30/10 G16B30/00

    摘要: Systems and methods for analyzing genomic information can include obtaining a sequence read including genetic information; identifying, within a graph representing a reference genome, a plurality of candidate mapping positions that relate to the genetic information, the graph comprising nodes representing genetic sequences and edges connecting pairs of nodes; determining, by means of a computer system, whether an alignment with the graph surrounding each of the plurality of candidate mapping positions is advanced or basic; and performing for each candidate mapping position, by means of the computer system, a local alignment based on whether the local alignment is advanced or basic. The advanced local alignment can include a first-local-alignment algorithm, and the basic local alignment includes a second-local-alignment algorithm. Based on the local alignments, the mapped position of the sequence read can be identified within the genome.

    Systems and methods for mitochondrial analysis

    公开(公告)号:US11649495B2

    公开(公告)日:2023-05-16

    申请号:US16798759

    申请日:2020-02-24

    摘要: The invention provides methods of analyzing an individual's mtDNA by transforming available reference sequences into a directed graph that compactly represents all the information without duplication and comparing sequence reads from the mtDNA to the graph to identify the individual or describe their mtDNA. A directed graph can represent all of the genetic variation found among the mitochondrial genomes across all of a number of reference organisms while providing a single article to which sequence reads can be aligned or compared. Thus any sequence read or other sequence fragment can be compared, in a single operation, to the article that represents all of the reference mitochondrial sequences.

    Methods and systems for detecting sequence variants

    公开(公告)号:US11447828B2

    公开(公告)日:2022-09-20

    申请号:US16106996

    申请日:2018-08-21

    发明人: Deniz Kural

    摘要: The invention includes methods and systems for identifying diseased-induced mutations by producing multi-dimensional reference sequence constructs that account for variations between individuals, different diseases, and different stages of those diseases. Once constructed, these reference sequence constructs can be used to align sequence reads corresponding to genetic samples from patients suspected of having a disease, or who have had the disease and are in suspected remission. The reference sequence constructs also provide insight to the genetic progression of the disease.

    BIOLOGICAL GRAPH OR SEQUENCE SERIALIZATION

    公开(公告)号:US20220261384A1

    公开(公告)日:2022-08-18

    申请号:US17729896

    申请日:2022-04-26

    发明人: Vladimir Semenyuk

    摘要: Methods of the invention include representing biological data in a memory subsystem within a computer system with a data structure that is particular to a location in the memory subsystem and serializing the data structure into a stream of bytes that can be deserialized into a clone of the data structure. In a preferred genomic embodiment, the biological data comprises genomic sequences and the data structure comprises a genomic directed acyclic graph (DAG) in which objects have adjacency lists of pointers that indicate the location of any object adjacent to that object. After serialization and deserialization, the clone genomic DAG has the same structure as the original to represent the same sequences and relationships among them as the original.

    METHODS AND SYSTEMS FOR ALIGNING SEQUENCES IN THE PRESENCE OF REPEATING ELEMENTS

    公开(公告)号:US20210398616A1

    公开(公告)日:2021-12-23

    申请号:US17359338

    申请日:2021-06-25

    发明人: Deniz Kural

    摘要: The invention includes methods for aligning reads (e.g., nucleic acid reads) comprising repeating sequences, methods for building reference sequence constructs comprising repeating sequences, and systems that can be used to align reads comprising repeating sequences. The method is scalable, and can be used to align millions of reads to a construct thousands of bases long. The methods and systems can additionally account for variability within a repeating sequence, or near to a repeating sequence, due to genetic mutation.

    Methods and Systems for Stream-Processing of Biomedical Data

    公开(公告)号:US20210258399A1

    公开(公告)日:2021-08-19

    申请号:US17191187

    申请日:2021-03-03

    发明人: Nemanja Zbiljic

    摘要: A method for stream-processing biomedical data includes receiving, by a file system on a computing device, a first request for access to at least a first portion of a file stored on a remotely located storage device. The method includes receiving, by the file system, a second request for access to at least a second portion of the file. The method includes determining, by a pre-fetching component executing on the computing device, whether the first request and the second request are associated with a sequential read operation. The method includes automatically retrieving, by the pre-fetching component, a third portion of the requested file, before receiving a third request for access to least the third portion of the file, based on a determination that the first request and the second request are associated with the sequential read operation.

    Method and system for quantifying sequence alignment

    公开(公告)号:US10832797B2

    公开(公告)日:2020-11-10

    申请号:US14517419

    申请日:2014-10-17

    发明人: Deniz Kural

    摘要: The invention includes methods for aligning reads (e.g., nucleic acid reads, amino acid reads) to a reference sequence construct, methods for building the reference sequence construct, and systems that use the alignment methods and constructs to produce sequences. The invention also includes methods and systems for evaluating the quality of the alignment between the reads and the reference sequence construct. The method is scalable, and can be used to align millions of reads to a construct thousands of bases or amino acids long. The invention additionally includes methods for identifying a disease or a genotype based upon alignment of nucleic acid reads to a location in the construct.

    SYSTEMS AND METHODS FOR MITOCHONDRIAL ANALYSIS

    公开(公告)号:US20200232029A1

    公开(公告)日:2020-07-23

    申请号:US16798759

    申请日:2020-02-24

    IPC分类号: C12Q1/6874 C12Q1/6888

    摘要: The invention provides methods of analyzing an individual's mtDNA by transforming available reference sequences into a directed graph that compactly represents all the information without duplication and comparing sequence reads from the mtDNA to the graph to identify the individual or describe their mtDNA. A directed graph can represent all of the genetic variation found among the mitochondrial genomes across all of a number of reference organisms while providing a single article to which sequence reads can be aligned or compared. Thus any sequence read or other sequence fragment can be compared, in a single operation, to the article that represents all of the reference mitochondrial sequences.

    System and method for dynamic control of workflow execution

    公开(公告)号:US10678613B2

    公开(公告)日:2020-06-09

    申请号:US16176833

    申请日:2018-10-31

    摘要: Some embodiments relate to systems for processing one or more computational workflows. In one embodiment, a description of a computational comprises a plurality of applications, in which applications are represented as nodes and edges connect the nodes indicate the flow of data elements between applications. A task execution module is configured to create and execute tasks. An application programming interface (API) is in communication with the task execution module and comprises a plurality of function calls for controlling at least one function of the task execution module. An API script includes instructions to the API to create and execute a plurality of tasks corresponding to the execution of the computational workflow for a plurality of samples. A graphical user interface (GUI) is in communication with the task execution module and configured to receive input from an end user to initiate execution of the API script.