-
公开(公告)号:US20210264203A1
公开(公告)日:2021-08-26
申请号:US17046313
申请日:2019-11-18
Applicant: Google LLC
Inventor: Ariel Fuxman , Aleksei Timofeev , Zhen Li , Chun-Ta Lu , Manan Shah , Chen Sun , Krishnamurthy Viswanathan , Chao Jia
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for realizing a multimodal image classifier. In an aspect, a method includes, for each image of a plurality of images: processing the image by a textual generator model to obtain a set of phrases that are descriptive of the content of the image, wherein each phrase is one or more terms, processing the set of phrases by a textual embedding model to obtain an embedding of predicted text for the image, and processing the image using an image embedding model to obtain an embedding of image pixels of the image. Then a multimodal image classifier is trained on the embeddings of predicted text for the images and the embeddings of image pixels for the images to produce, as output, labels of an output taxonomy to classify an image based on the image as input.
-
公开(公告)号:US20240428573A1
公开(公告)日:2024-12-26
申请号:US18341218
申请日:2023-06-26
Applicant: Google LLC
Inventor: Ariel Fuxman , Alexander Kenji Hata , Edward Benjamin Vendrow , Otilia Stretcu , Wenlei Zhou , Krishnamurthy Viswanathan , Aditya Avinash , Gabriel Berger , Andrew Ames Bunner , Javier Alejandro Rey , Wei Qiao , Yintao Liu , Guanzhong Wang , Thomas Nathan Denby , Mehmet Nejat Tek , Neil Gordon Alldrin , Enming Luo , Chun-Ta Lu
IPC: G06V10/778 , G06V10/764 , G06V10/774 , G06V10/82 , G06V10/94
Abstract: A computer-implemented method includes receiving an input from a user relating to a concept, automatically obtaining a first set of images from an unlabeled dataset of images based on the input, and obtaining a first rating via the user for each image from the first set of images. The method further includes training a classifier model relating to the concept based on the first set of images rated by the user, automatically obtaining a second set of images from the unlabeled dataset of images based on the classifier model trained based on the first set of images, and obtaining a second rating via the user for each image from the second set of images. The classifier model relating to the concept is retrained based on the first set of images rated by the user and the second set of images rated by the user to obtain an updated classifier model.
-
公开(公告)号:US20240143700A1
公开(公告)日:2024-05-02
申请号:US18409411
申请日:2024-01-10
Applicant: Google LLC
Inventor: Ariel Fuxman , Aleksei Timofeev , Zhen Li , Chun-Ta Lu , Manan Shah , Chen Sun , Krishnamurthy Viswanathan , Chao Jia
IPC: G06F18/24 , G06F18/214 , G06F18/2413
CPC classification number: G06F18/24 , G06F18/214 , G06F18/24147
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for realizing a multimodal image classifier. In an aspect, a method includes, for each image of a plurality of images: processing the image by a textual generator model to obtain a set of phrases that are descriptive of the content of the image, wherein each phrase is one or more terms, processing the set of phrases by a textual embedding model to obtain an embedding of predicted text for the image, and processing the image using an image embedding model to obtain an embedding of image pixels of the image. Then a multimodal image classifier is trained on the embeddings of predicted text for the images and the embeddings of image pixels for the images to produce, as output, labels of an output taxonomy to classify an image based on the image as input.
-
公开(公告)号:US11907337B2
公开(公告)日:2024-02-20
申请号:US17046313
申请日:2019-11-18
Applicant: Google LLC
Inventor: Ariel Fuxman , Aleksei Timofeev , Zhen Li , Chun-Ta Lu , Manan Shah , Chen Sun , Krishnamurthy Viswanathan , Chao Jia
IPC: G06K9/62 , G06K9/46 , G06F18/24 , G06F18/214 , G06F18/2413
CPC classification number: G06F18/24 , G06F18/214 , G06F18/24147
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for realizing a multimodal image classifier. In an aspect, a method includes, for each image of a plurality of images: processing the image by a textual generator model to obtain a set of phrases that are descriptive of the content of the image, wherein each phrase is one or more terms, processing the set of phrases by a textual embedding model to obtain an embedding of predicted text for the image, and processing the image using an image embedding model to obtain an embedding of image pixels of the image. Then a multimodal image classifier is trained on the embeddings of predicted text for the images and the embeddings of image pixels for the images to produce, as output, labels of an output taxonomy to classify an image based on the image as input.
-
-
-