SYSTEM AND METHOD FOR MOLECULAR RECONSTRUCTION FROM MOLECULAR PROBABILITY DISTRIBUTIONS

    公开(公告)号:US20220198286A1

    公开(公告)日:2022-06-23

    申请号:US17540153

    申请日:2021-12-01

    申请人: Ro5 Inc.

    摘要: A system and method comprising a transmoler that identifies common substructures of a given 3D conformer and predicts its structural information. First, based on contrastive learning, substructure embeddings are learned in an unsupervised manner. Secondly, a novel oriented 3D object regressor predicts the dimensions and directions of each substructure in a conformer as well as its fingerprint embedding which are used to create differentiable junction tree molecular graphs. Lastly, using the junction tree graphs, molecular representations such as DeepSMILES are generated which represent new and novel molecules. The system may also generate conformers directly from a pocket. A pocket may be input to the model and the model learns to generate structures which can fit that pocket by conditioning the generative system. Furthermore, structure-based contrastive embeddings generated for transmoler can be recycled in structure-based generative modelling.

    Biological graph or sequence serialization

    公开(公告)号:US11347704B2

    公开(公告)日:2022-05-31

    申请号:US14885192

    申请日:2015-10-16

    发明人: Vladimir Semenyuk

    摘要: Methods of the invention include representing biological data in a memory subsystem within a computer system with a data structure that is particular to a location in the memory subsystem and serializing the data structure into a stream of bytes that can be deserialized into a clone of the data structure. In a preferred genomic embodiment, the biological data comprises genomic sequences and the data structure comprises a genomic directed acyclic graph (DAG) in which objects have adjacency lists of pointers that indicate the location of any object adjacent to that object. After serialization and deserialization, the clone genomic DAG has the same structure as the original to represent the same sequences and relationships among them as the original.

    SYSTEM AND METHOD FOR VARIANT CALLING

    公开(公告)号:US20220108768A1

    公开(公告)日:2022-04-07

    申请号:US17429477

    申请日:2020-02-27

    申请人: NANTOMICS, LLC

    摘要: A locus tester or locust database has stored therein DNA or RNA sequence information for one or more loci of interest. The sequence information may include a list of k-mers in a given DNA or RNA sequence, an identification of whether each k-mer in the list of k-mers appears in a reference sequence or in a variation of the reference sequence, and a count of how many times each k-mer in the list of k-mers has been identified in sequence information for the locus of interest in question. Sequence data for the locus in question received from a data source may be broken into fragments, with each fragment containing one or more k-mers. These k-mers may be quickly compared to the list of k-mers in the locust database to determine whether the sequence data corresponds to the reference sequence or to a variation of the reference sequence.