SINGLE SAMPLE GENETIC CLASSIFICATION VIA TENSOR MOTIFS

    公开(公告)号:US20190258776A1

    公开(公告)日:2019-08-22

    申请号:US15900048

    申请日:2018-02-20

    摘要: A computer-implemented method includes generating, by a processor, a set of training data for each phenotype in a database including a set of subjects. The set of training data is generated by dividing genomic information of N subjects selected with or without repetition into windows, computing a distribution of genomic events in the windows for each of N subjects, and extracting, for each window, a tensor that represents the distribution of genomic events for each of N subjects. A set of test data is generated for each phenotype in the database, a distribution of genomic events in windows for each phenotype is computed, and a tensor is extracted for each window that represents a distribution of genomic events for each phenotype. The method includes classifying each phenotype of the test data with a classifier, and assigning a phenotype to a patient.

    COGNITIVE IDENTIFICATION OF PATHOGENIC PATHWAYS

    公开(公告)号:US20200251182A1

    公开(公告)日:2020-08-06

    申请号:US16266733

    申请日:2019-02-04

    摘要: Embodiments of the present invention are directed to methods for adapting machine learning, redescription, and computational homology techniques to the identification of pathogenic pathways. A non-limiting example of the computer-implemented method includes receiving genetic and biological data and generating a data matrix based on the data. The data matrix can include one or more features, and each feature can be associated with a known feature value. A collection of sets of features representing pathways, genes, or a genetic combination of genotype values can be determined. The method also includes determining a first prediction for a feature value of a selected feature to be predicted in the collection, permuting one or more rows of the data matrix, and recalculating a second prediction for the feature value based on the permutation. A prediction score can be determined based on the first prediction, the second prediction, and a known feature value.