PREDICTING COMPLETE PROTEIN REPRESENTATIONS FROM MASKED PROTEIN REPRESENTATIONS

    公开(公告)号:US20240087686A1

    公开(公告)日:2024-03-14

    申请号:US18273594

    申请日:2022-01-27

    CPC classification number: G16B40/20 G16B15/20 G16B15/30 G16B20/50 G16B30/00

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for unmasking a masked representation of a protein using a protein reconstruction neural network. In one aspect, a method comprises: receiving the masked representation of the protein; and processing the masked representation of the protein using the protein reconstruction neural network to generate a respective predicted embedding corresponding to one or more masked embeddings that are included in the masked representation of the protein, wherein a predicted embedding corresponding to a masked embedding in a representation of the amino acid sequence of the protein defines a prediction for an identity of an amino acid at a corresponding position in the amino acid sequence, wherein a predicted embedding corresponding to a masked embedding in a representation of the structure of the protein defines a prediction for a corresponding structural feature of the protein.

Patent Agency Ranking