Multi-scale text filter conditioned generative adversarial networks

Invention Grant

US11170256B2 Multi-scale text filter conditioned generative adversarial networks 有权

Please log in to see more content

Patent Title: Multi-scale text filter conditioned generative adversarial networks
Application No.: US16577337

Application Date: 2019-09-20
Publication No.: US11170256B2

Publication Date: 2021-11-09
Inventor: Renqiang Min , Bing Bai , Yogesh Balaji
Applicant: NEC Laboratories America, Inc.
Applicant Address: US NJ Princeton
Assignee: NEC Laboratories America, Inc.
Current Assignee: NEC Laboratories America, Inc.
Current Assignee Address: US NJ Princeton
Agent Joseph Kolodka
Main IPC: G06K9/00
IPC: G06K9/00 ; G06K9/62 ; G06N3/08 ; G06N3/04 ; G06F40/279

Multi-scale text filter conditioned generative adversarial networks

Abstract:

Systems and methods for processing video are provided. The method includes receiving a text-based description of active scenes and representing the text-based description as a word embedding matrix. The method includes using a text encoder implemented by neural network to output frame level textual representation and video level representation of the word embedding matrix. The method also includes generating, by a shared generator, frame by frame video based on the frame level textual representation, the video level representation and noise vectors. A frame level and a video level convolutional filter of a video discriminator are generated to classify frames and video of the frame by frame video as true or false. The method also includes training a conditional video generator that includes the text encoder, the video discriminator, and the shared generator in a generative adversarial network to convergence.

Public/Granted literature

US20200097766A1 MULTI-SCALE TEXT FILTER CONDITIONED GENERATIVE ADVERSARIAL NETWORKS Public/Granted day:2020-03-26

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )