Multimodal Image Classifier using Textual and Visual Embeddings

    公开(公告)号:US20210264203A1

    公开(公告)日:2021-08-26

    申请号:US17046313

    申请日:2019-11-18

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for realizing a multimodal image classifier. In an aspect, a method includes, for each image of a plurality of images: processing the image by a textual generator model to obtain a set of phrases that are descriptive of the content of the image, wherein each phrase is one or more terms, processing the set of phrases by a textual embedding model to obtain an embedding of predicted text for the image, and processing the image using an image embedding model to obtain an embedding of image pixels of the image. Then a multimodal image classifier is trained on the embeddings of predicted text for the images and the embeddings of image pixels for the images to produce, as output, labels of an output taxonomy to classify an image based on the image as input.

    Preventing The Distribution Of Forbidden Network Content With Robustified Detection

    公开(公告)号:US20250053865A1

    公开(公告)日:2025-02-13

    申请号:US18561104

    申请日:2022-12-14

    Applicant: Google LLC

    Abstract: The technology is generally directed to the training and execution of a model to identify policy violating content that has been obfuscated. The model may be trained using obfuscated training images. The obfuscated training images may be associated with one or more labels, such as a policy, obfuscation label, etc. The obfuscated training images and associated labels may be input into the model. During training, the output of the model may be a policy prediction as to whether the obfuscated input images violate the content policy of a host or are approved content for publishing. During implementation, the model may receive content as input and provide as output a policy prediction for the content. The host may use the policy prediction provided by the model to determine whether or not to publish the content.

    Multimodal Image Classifier using Textual and Visual Embeddings

    公开(公告)号:US20240143700A1

    公开(公告)日:2024-05-02

    申请号:US18409411

    申请日:2024-01-10

    Applicant: Google LLC

    CPC classification number: G06F18/24 G06F18/214 G06F18/24147

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for realizing a multimodal image classifier. In an aspect, a method includes, for each image of a plurality of images: processing the image by a textual generator model to obtain a set of phrases that are descriptive of the content of the image, wherein each phrase is one or more terms, processing the set of phrases by a textual embedding model to obtain an embedding of predicted text for the image, and processing the image using an image embedding model to obtain an embedding of image pixels of the image. Then a multimodal image classifier is trained on the embeddings of predicted text for the images and the embeddings of image pixels for the images to produce, as output, labels of an output taxonomy to classify an image based on the image as input.

Patent Agency Ranking