-
1.
公开(公告)号:US20230313271A1
公开(公告)日:2023-10-05
申请号:US18172821
申请日:2023-02-22
发明人: Steven Norberg , Luis Fernando Camarillo Guerrero , Colin Brown , Andrea Manzo , Sarah E. Shultzaberger , Michael Eberle , Sepideh Almasi , Suzanne Rohrback , Pascale Mathonet , Egor Dolzhenko
IPC分类号: C12Q1/6809 , G16C20/70
CPC分类号: C12Q1/6809 , G16C20/70
摘要: This disclosure describes methods, non-transitory computer readable media, and systems that can use a machine-learning to determine factors or scores indicating an error level with which a given methylation assay detects methylation of cytosine bases. For instance, the disclosed systems use a machine-learning model to generate a bias score indicating a degree to which a given methylation assay errs in detecting cytosine methylation when specific sequence contexts surround such cytosines compared to other sequence contexts. The machine-learning model may take various forms of models, including a decision-tree model, a neural network, or a combination of a decision-tree model and a neural network. In some cases, the disclosed system combines or uses bias scores from multiple machine-learning models to generate a consensus bias score.
-
公开(公告)号:US20230019053A1
公开(公告)日:2023-01-19
申请号:US17839075
申请日:2022-06-13
申请人: Illumina, Inc.
发明人: Sai Chen , Egor Dolzhenko , Michael A. Eberle
摘要: Disclosed herein include systems, devices, and methods for determining a variable number tandem repeat (VNTR) status. Haplotypes of a VNTR can be determined using long sequence reads of reference samples aligned to the VNTR in a reference. Short reads of a test sample of a test subject can be aligned to the haplotypes determined using the long sequence reads to determine a VNTR status (e.g., one or more haplotypes or a genotype of the test subject) of the test subject based on the probability indications of the haplotypes.
-
公开(公告)号:US20220254442A1
公开(公告)日:2022-08-11
申请号:US17547297
申请日:2021-12-10
申请人: Illumina, Inc.
发明人: Egor Dolzhenko , Michael A. Eberle
摘要: The disclosed embodiments concern methods, apparatus, systems and computer program products for genotyping and visualizing repeat sequences such as medically significant short tandem repeats (STRs). Some implementations can be used to genotype and visualize repeat sequences each including two or more repeat sub-sequences. Some implementations provides a computer tool to generate sequence read pileups for visualizing repeat sequences for samples that have different genotypes of the repeat sequence, each sequence pileup including reads aligned to two or more different haplotypes.
-
公开(公告)号:US20200286586A1
公开(公告)日:2020-09-10
申请号:US16811919
申请日:2020-03-06
申请人: Illumina, Inc.
发明人: Egor Dolzhenko , Michael A. Eberle
摘要: The disclosed embodiments concern methods, apparatus, systems and computer program products for genotyping repeat sequences such as medically significant short tandem repeats (STRs). The methods involve aligning reads to a repeat sequence represented by a sequence graph, and using the aligned reads to genotype the repeat sequence. The sequence graph is a directed graph each including at least one self-loop representing a repeat sub-sequence. In some implementations, the reads are paired end reads, and both mates of each read pair may be used to genotype the repeat sequences. Some implementations can be used to determine degenerate codon repeats. Some implementations can be used to genotype repeat sequences each including two or more repeat sub-sequences. Some implementations can be used to genotype nucleic acid sequences each including at least one repeat sub-sequence and another genetic variant such as an insertion, deletion, or substitution.
-
-
-