Systems and methods for partially supervised learning with momentum prototypes

    公开(公告)号:US12056610B2

    公开(公告)日:2024-08-06

    申请号:US17005763

    申请日:2020-08-28

    CPC classification number: G06N3/08 G06F18/2148 G06F18/217 G06F18/2431

    Abstract: A learning mechanism with partially-labeled web images is provided while correcting the noise labels during the learning. Specifically, the mechanism employs a momentum prototype that represents common characteristics of a specific class. One training objective is to minimize the difference between the normalized embedding of a training image sample and the momentum prototype of the corresponding class. Meanwhile, during the training process, the momentum prototype is used to generate a pseudo label for the training image sample, which can then be used to identify and remove out of distribution (OOD) samples to correct the noisy labels from the original partially-labeled training images. The momentum prototype for each class is in turn constantly updated based on the embeddings of new training samples and their pseudo labels.

    SYSTEMS AND METHODS FOR VISION-LANGUAGE MODEL INSTRUCTION TUNING

    公开(公告)号:US20240160858A1

    公开(公告)日:2024-05-16

    申请号:US18505982

    申请日:2023-11-09

    CPC classification number: G06F40/40 G06V10/774 G06V10/82 G06V20/70

    Abstract: Embodiments described herein provide a method of generating a vision-language task output to a text instruction relating to an input image, the method comprising receiving, via a data interface, the input image and the text instruction comprising an instruction relating to the image. The method further includes encoding, via an image encoder, the image into a first image representation. The method further includes generating, by a multimodal encoder, a second image representation based on cross-attending the first image representation to the text instruction. The method further includes generating, by a neural network based language model, a vision-language task output in response to the text instruction based on an input combining the second image representation and the text instruction.

Patent Agency Ranking