Generating responses to queries about videos utilizing a multi-modal neural network with attention

Invention Grant

US11615308B2 Generating responses to queries about videos utilizing a multi-modal neural network with attention 有权

Please log in to see more content

Patent Title: Generating responses to queries about videos utilizing a multi-modal neural network with attention
Application No.: US17563901

Application Date: 2021-12-28
Publication No.: US11615308B2

Publication Date: 2023-03-28
Inventor: Wentian Zhao , Seokhwan Kim , Ning Xu , Hailin Jin
Applicant: Adobe Inc.
Applicant Address: US CA San Jose
Assignee: Adobe Inc.
Current Assignee: Adobe Inc.
Current Assignee Address: US CA San Jose
Agency: Keller Preece PLLC
Main IPC: G06K9/00
IPC: G06K9/00 ; G06N3/02 ; G06F17/16 ; G06N3/08 ; G06V20/40 ; G06V30/18 ; G06V30/19 ; G06V10/82 ; G06V20/62 ; G06V30/10

Generating responses to queries about videos utilizing a multi-modal neural network with attention

Abstract:

The present disclosure relates to systems, methods, and non-transitory computer-readable media for generating a response to a question received from a user during display or playback of a video segment by utilizing a query-response-neural network. The disclosed systems can extract a query vector from a question corresponding to the video segment using the query-response-neural network. The disclosed systems further generate context vectors representing both visual cues and transcript cues corresponding to the video segment using context encoders or other layers from the query-response-neural network. By utilizing additional layers from the query-response-neural network, the disclosed systems generate (i) a query-context vector based on the query vector and the context vectors, and (ii) candidate-response vectors representing candidate responses to the question from a domain-knowledge base or other source. To respond to a user's question, the disclosed systems further select a response from the candidate responses based on a comparison of the query-context vector and the candidate-response vectors.

Public/Granted literature

US20220122357A1 GENERATING RESPONSES TO QUERIES ABOUT VIDEOS UTILIZING A MULTI-MODAL NEURAL NETWORK WITH ATTENTION Public/Granted day:2022-04-21

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )