Locked-Model Multimodal Contrastive Tuning
    1.
    发明公开

    公开(公告)号:US20240153256A1

    公开(公告)日:2024-05-09

    申请号:US18051106

    申请日:2022-10-31

    Applicant: Google LLC

    CPC classification number: G06V10/778

    Abstract: A method may include obtaining a pretrained image encoder and a training sample comprising a training image and a training text string corresponding to the training image. The method may also include initializing a text encoder in an untrained state, determining, using the pretrained image encoder and based on the training image, a first latent representation of the training image, and determining, using the text encoder and based on the training text string, a second latent representation of the training text string. The method may further include determining a loss value based on the first latent representation and the second latent representation, updating, based on the loss value, one or more parameters of the text encoder while holding fixed parameters of the pretrained image encoder, and outputting the text encoder in a trained state.

Patent Agency Ranking