Invention Application
- Patent Title: GENERATING RESPONSES TO QUERIES ABOUT VIDEOS UTILIZING A MULTI-MODAL NEURAL NETWORK WITH ATTENTION
-
Application No.: US17563901Application Date: 2021-12-28
-
Publication No.: US20220122357A1Publication Date: 2022-04-21
- Inventor: Wentian Zhao , Seokhwan Kim , Ning Xu , Hailin Jin
- Applicant: Adobe Inc.
- Applicant Address: US CA San Jose
- Assignee: Adobe Inc.
- Current Assignee: Adobe Inc.
- Current Assignee Address: US CA San Jose
- Main IPC: G06V20/40
- IPC: G06V20/40 ; G06F17/16 ; G06N3/02

Abstract:
The present disclosure relates to systems, methods, and non-transitory computer-readable media for generating a response to a question received from a user during display or playback of a video segment by utilizing a query-response-neural network. The disclosed systems can extract a query vector from a question corresponding to the video segment using the query-response-neural network. The disclosed systems further generate context vectors representing both visual cues and transcript cues corresponding to the video segment using context encoders or other layers from the query-response-neural network. By utilizing additional layers from the query-response-neural network, the disclosed systems generate (i) a query-context vector based on the query vector and the context vectors, and (ii) candidate-response vectors representing candidate responses to the question from a domain-knowledge base or other source. To respond to a user's question, the disclosed systems further select a response from the candidate responses based on a comparison of the query-context vector and the candidate-response vectors.
Public/Granted literature
- US11615308B2 Generating responses to queries about videos utilizing a multi-modal neural network with attention Public/Granted day:2023-03-28
Information query