Invention Publication
- Patent Title: Visual and Audio Multimodal Searching System
-
Application No.: US18306638Application Date: 2023-04-25
-
Publication No.: US20240362279A1Publication Date: 2024-10-31
- Inventor: Harshit Kharbanda , Belinda Luna Zeng , Viviana Caso Corella , Christopher James Kelley , Jessica Lee , Pendar Yousefi , Dounia Berrada , Sundeep Vaddadi , Kai Yu , Balint Miklos , Severin Heiniger , Louis Wang
- Applicant: Google LLC
- Applicant Address: US CA Mountain View
- Assignee: Google LLC
- Current Assignee: Google LLC
- Current Assignee Address: US CA Mountain View
- Main IPC: G06F16/9532
- IPC: G06F16/9532 ; G06F16/538 ; G06F40/40

Abstract:
A multimodal search system is described. The system can receive image data captured by a camera of a user device. Additionally, the system can receive audio data associated with the image data. The audio data can be captured by a microphone of the user device. Moreover, the system can process the image data to generate visual features. Furthermore, the system can process the audio data to generate a plurality of words. The system can generate a plurality of search terms based on the plurality of words and the visual features. Subsequently, the system can determine one or more search results associated with the plurality of search terms and provide the one or more search results as an output.
Information query