Patent search ap:("Google LLC") AND inv:"Manan Shah" Page 1

1.

发明公开
Multimodal Image Classifier using Textual and Visual Embeddings 审中-公开

公开(公告)号：US20240143700A1

公开(公告)日：2024-05-02

申请号：US18409411

申请日：2024-01-10

Applicant: Google LLC

Inventor： Ariel Fuxman , Aleksei Timofeev , Zhen Li , Chun-Ta Lu , Manan Shah , Chen Sun , Krishnamurthy Viswanathan , Chao Jia

IPC: G06F18/24 , G06F18/214 , G06F18/2413

CPC classification number: G06F18/24 , G06F18/214 , G06F18/24147

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for realizing a multimodal image classifier. In an aspect, a method includes, for each image of a plurality of images: processing the image by a textual generator model to obtain a set of phrases that are descriptive of the content of the image, wherein each phrase is one or more terms, processing the set of phrases by a textual embedding model to obtain an embedding of predicted text for the image, and processing the image using an image embedding model to obtain an embedding of image pixels of the image. Then a multimodal image classifier is trained on the embeddings of predicted text for the images and the embeddings of image pixels for the images to produce, as output, labels of an output taxonomy to classify an image based on the image as input.

2.

发明授权
Multimodal image classifier using textual and visual embeddings 有权

公开(公告)号：US11907337B2

公开(公告)日：2024-02-20

申请号：US17046313

申请日：2019-11-18

Applicant: Google LLC

Inventor： Ariel Fuxman , Aleksei Timofeev , Zhen Li , Chun-Ta Lu , Manan Shah , Chen Sun , Krishnamurthy Viswanathan , Chao Jia

IPC: G06K9/62 , G06K9/46 , G06F18/24 , G06F18/214 , G06F18/2413

CPC classification number: G06F18/24 , G06F18/214 , G06F18/24147

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for realizing a multimodal image classifier. In an aspect, a method includes, for each image of a plurality of images: processing the image by a textual generator model to obtain a set of phrases that are descriptive of the content of the image, wherein each phrase is one or more terms, processing the set of phrases by a textual embedding model to obtain an embedding of predicted text for the image, and processing the image using an image embedding model to obtain an embedding of image pixels of the image. Then a multimodal image classifier is trained on the embeddings of predicted text for the images and the embeddings of image pixels for the images to produce, as output, labels of an output taxonomy to classify an image based on the image as input.

3.

发明申请
Multimodal Image Classifier using Textual and Visual Embeddings 有权

公开(公告)号：US20210264203A1

公开(公告)日：2021-08-26

申请号：US17046313

申请日：2019-11-18

Applicant: Google LLC

Inventor： Ariel Fuxman , Aleksei Timofeev , Zhen Li , Chun-Ta Lu , Manan Shah , Chen Sun , Krishnamurthy Viswanathan , Chao Jia

IPC: G06K9/62 , G06K9/46

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for realizing a multimodal image classifier. In an aspect, a method includes, for each image of a plurality of images: processing the image by a textual generator model to obtain a set of phrases that are descriptive of the content of the image, wherein each phrase is one or more terms, processing the set of phrases by a textual embedding model to obtain an embedding of predicted text for the image, and processing the image using an image embedding model to obtain an embedding of image pixels of the image. Then a multimodal image classifier is trained on the embeddings of predicted text for the images and the embeddings of image pixels for the images to produce, as output, labels of an output taxonomy to classify an image based on the image as input.

Patent Agency Ranking