-
公开(公告)号:US11170256B2
公开(公告)日:2021-11-09
申请号:US16577337
申请日:2019-09-20
Applicant: NEC Laboratories America, Inc.
Inventor: Renqiang Min , Bing Bai , Yogesh Balaji
IPC: G06K9/00 , G06K9/62 , G06N3/08 , G06N3/04 , G06F40/279
Abstract: Systems and methods for processing video are provided. The method includes receiving a text-based description of active scenes and representing the text-based description as a word embedding matrix. The method includes using a text encoder implemented by neural network to output frame level textual representation and video level representation of the word embedding matrix. The method also includes generating, by a shared generator, frame by frame video based on the frame level textual representation, the video level representation and noise vectors. A frame level and a video level convolutional filter of a video discriminator are generated to classify frames and video of the frame by frame video as true or false. The method also includes training a conditional video generator that includes the text encoder, the video discriminator, and the shared generator in a generative adversarial network to convergence.
-
公开(公告)号:US20200097766A1
公开(公告)日:2020-03-26
申请号:US16577337
申请日:2019-09-20
Applicant: NEC Laboratories America, Inc.
Inventor: Renqiang Min , Bing Bai , Yogesh Balaji
Abstract: Systems and methods for processing video are provided. The method includes receiving a text-based description of active scenes and representing the text-based description as a word embedding matrix. The method includes using a text encoder implemented by neural network to output frame level textual representation and video level representation of the word embedding matrix. The method also includes generating, by a shared generator, frame by frame video based on the frame level textual representation, the video level representation and noise vectors. A frame level and a video level convolutional filter of a video discriminator are generated to classify frames and video of the frame by frame video as true or false. The method also includes training a conditional video generator that includes the text encoder, the video discriminator, and the shared generator in a generative adversarial network to convergence.
-