IMAGE-BASED VARIANT PATHOGENICITY DETERMINATION

    公开(公告)号:US20230245305A1

    公开(公告)日:2023-08-03

    申请号:US18160855

    申请日:2023-01-27

    IPC分类号: G06T7/00 G06T7/70 G06T17/00

    摘要: Described herein are technologies for classifying a protein structure (such as technologies for classifying the pathogenicity of a protein structure related to a nucleotide variant). Such a classification is based on two-dimensional images taken from a three-dimensional image of the protein structure. With respect to some implementations, described herein are multi-view convolutional neural networks (CNNs) for classifying a protein structure based on inputs of two-dimensional images taken from a three-dimensional image of the protein structure. In some implementations, a computer-implemented method of determining pathogenicity of variants includes accessing a structural rendition of amino acids, capturing images of those parts of the structural rendition that contain a target amino acid from the amino acids, and, based on the images, determining pathogenicity of a nucleotide variant that mutates the target amino acid into an alternate amino acid.

    Protein structure-based protein language models

    公开(公告)号:US11538555B1

    公开(公告)日:2022-12-27

    申请号:US17533091

    申请日:2021-11-22

    摘要: The technology disclosed relates to determining pathogenicity of nucleotide variants. In particular, the technology disclosed relates to specifying a particular amino acid at a particular position in a protein as a gap amino acid, and specifying remaining amino acids at remaining positions in the protein as non-gap amino acids. The technology disclosed further relates to generating a gapped spatial representation of the protein that includes spatial configurations of the non-gap amino acids, and excludes a spatial configuration of the gap amino acid, and determining a pathogenicity of a nucleotide variant based at least in part on the gapped spatial representation, and a representation of an alternate amino acid created by the nucleotide variant at the particular position.

    Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3D) protein structures

    公开(公告)号:US11515010B2

    公开(公告)日:2022-11-29

    申请号:US17468411

    申请日:2021-09-07

    摘要: The technology disclosed relates to determining pathogenicity of variants. In particular, the technology disclosed relates to generating amino acid-wise distance channels for a plurality of amino acids in a protein. Each of the amino acid-wise distance channels has voxel-wise distance values for voxels in a plurality of voxels. A tensor includes the amino acid-wise distance channels and at least an alternative allele of the protein expressed by a variant. A deep convolutional neural network determines a pathogenicity of the variant based at least in part on processing the tensor. The technology disclosed further augments the tensor with supplemental information like a reference allele of the protein, evolutionary conservation data about the protein, annotation data about the protein, and structure confidence data about the protein.

    Deep learning-based techniques for training deep convolutional neural networks

    公开(公告)号:US10558915B2

    公开(公告)日:2020-02-11

    申请号:US16413476

    申请日:2019-05-15

    申请人: Illumina, Inc.

    摘要: The technology disclosed relates to constructing a convolutional neural network-based classifier for variant classification. In particular, it relates to training a convolutional neural network-based classifier on training data using a backpropagation-based gradient update technique that progressively match outputs of the convolutional neutral network-based classifier with corresponding ground truth labels. The convolutional neural network-based classifier comprises groups of residual blocks, each group of residual blocks is parameterized by a number of convolution filters in the residual blocks, a convolution window size of the residual blocks, and an atrous convolution rate of the residual blocks, the size of convolution window varies between groups of residual blocks, the atrous convolution rate varies between groups of residual blocks. The training data includes benign training examples and pathogenic training examples of translated sequence pairs generated from benign variants and pathogenic variants.