Spoken query processing for image search

Invention Grant

US12288549B2 Spoken query processing for image search 有权

Please log in to see more content

Patent Title: Spoken query processing for image search
Application No.: US17887959

Application Date: 2022-08-15
Publication No.: US12288549B2

Publication Date: 2025-04-29
Inventor: Ajay Jain , Sanjeev Tagra , Sachin Soni , Ryan Rozich , Nikaash Puri , Jonathan Roeder
Applicant: Adobe Inc.
Applicant Address: US CA San Jose
Assignee: Adobe Inc.
Current Assignee: Adobe Inc.
Current Assignee Address: US CA San Jose
Agency: Shook Hardy & Bacon LLC
Main IPC: G10L15/06
IPC: G10L15/06 ; G06F3/16 ; G06F16/532 ; G06F40/284 ; G06F40/30 ; G06V10/774 ; G10L15/183 ; G10L15/22

Spoken query processing for image search

Abstract:

An image search system uses a multi-modal model to determine relevance of images to a spoken query. The multi-modal model includes a spoken language model that extracts features from spoken query and a language processing model that extract features from an image. The multi-model model determines a relevance score for the image and the spoken query based on the extracted features. The multi-modal model is trained using a curriculum approach that includes training the spoken language model using audio data. Subsequently, a training dataset comprising a plurality of spoken queries and one or more images associated with each spoken query is used to jointly train the spoken language model and an image processing model to provide a trained multi-modal model.

Public/Granted literature

US20240054991A1 SPOKEN QUERY PROCESSING FOR IMAGE SEARCH Public/Granted day:2024-02-15

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/06	.创建基准模板；训练语音识别系统，例如对说话者声音特征的适应（G10L15/14优先）