Invention Grant
- Patent Title: Spoken query processing for image search
-
Application No.: US17887959Application Date: 2022-08-15
-
Publication No.: US12288549B2Publication Date: 2025-04-29
- Inventor: Ajay Jain , Sanjeev Tagra , Sachin Soni , Ryan Rozich , Nikaash Puri , Jonathan Roeder
- Applicant: Adobe Inc.
- Applicant Address: US CA San Jose
- Assignee: Adobe Inc.
- Current Assignee: Adobe Inc.
- Current Assignee Address: US CA San Jose
- Agency: Shook Hardy & Bacon LLC
- Main IPC: G10L15/06
- IPC: G10L15/06 ; G06F3/16 ; G06F16/532 ; G06F40/284 ; G06F40/30 ; G06V10/774 ; G10L15/183 ; G10L15/22

Abstract:
An image search system uses a multi-modal model to determine relevance of images to a spoken query. The multi-modal model includes a spoken language model that extracts features from spoken query and a language processing model that extract features from an image. The multi-model model determines a relevance score for the image and the spoken query based on the extracted features. The multi-modal model is trained using a curriculum approach that includes training the spoken language model using audio data. Subsequently, a training dataset comprising a plurality of spoken queries and one or more images associated with each spoken query is used to jointly train the spoken language model and an image processing model to provide a trained multi-modal model.
Public/Granted literature
- US20240054991A1 SPOKEN QUERY PROCESSING FOR IMAGE SEARCH Public/Granted day:2024-02-15
Information query