Rapid genomic sequence classification using probabilistic data structures

    公开(公告)号:US11037654B2

    公开(公告)日:2021-06-15

    申请号:US15977667

    申请日:2018-05-11

    Applicant: NOBLIS, INC.

    Abstract: Techniques for identifying and/or classifying genomic information are provided. In some embodiments, genomic information may be identified by computing systems without access to a database of reference genomic information, instead relying on locally stored probabilistic data structures representing reference genomic information. Query genomic data, such as data taken from a read-set, may be divided into sub-strings, and each of the locally-stored probabilistic data structures may be queried by each of the extracted sub-strings, generating probabilistic outputs indicating either that (a) the sub-string is probably included in the set of data represented by the probabilistic data structure or (b) the sub-string is definitely not included in the set of data. Based on the number and/or proportion of sub-strings from a read-set that are indicated as being likely represented by a probabilistic data structure, a likely identity or classification for the genomic information in the read-set may be determined.

Patent Agency Ranking