System and method for retrieval-based controllable molecule generation

    公开(公告)号:US12159694B2

    公开(公告)日:2024-12-03

    申请号:US18353773

    申请日:2023-07-17

    Abstract: A machine learning framework is described for performing generation of candidate molecules for, e.g., drug discovery or other applications. The framework utilizes a pre-trained encoder-decoder model to interface between representations of molecules and embeddings for those molecules in a latent space. A fusion module is located between the encoder and decoder and is used to fuse an embedding for an input molecule with embeddings for one or more exemplary molecules selected from a database that is constructed according to a design criteria. The fused embedding is decoded using the decoder to generate a candidate molecule. The fusion module is trained to reconstruct a nearest neighbor to the input molecule from the database based on the sample of exemplary molecules. An iterative approach may be used during inference to dynamically update the database to include newly generated candidate molecules.

    SYSTEM AND METHOD FOR RETRIEVAL-BASED CONTROLLABLE MOLECULE GENERATION

    公开(公告)号:US20240029836A1

    公开(公告)日:2024-01-25

    申请号:US18353773

    申请日:2023-07-17

    CPC classification number: G16C20/90 G16C20/70

    Abstract: A machine learning framework is described for performing generation of candidate molecules for, e.g., drug discovery or other applications. The framework utilizes a pre-trained encoder-decoder model to interface between representations of molecules and embeddings for those molecules in a latent space. A fusion module is located between the encoder and decoder and is used to fuse an embedding for an input molecule with embeddings for one or more exemplary molecules selected from a database that is constructed according to a design criteria. The fused embedding is decoded using the decoder to generate a candidate molecule. The fusion module is trained to reconstruct a nearest neighbor to the input molecule from the database based on the sample of exemplary molecules. An iterative approach may be used during inference to dynamically update the database to include newly generated candidate molecules.

Patent Agency Ranking