Patent search ap:("NEC Laboratories America Page Inc.") AND inv:"Yogesh Balaji"

1.

发明授权
Multi-scale text filter conditioned generative adversarial networks 有权

公开(公告)号：US11170256B2

公开(公告)日：2021-11-09

申请号：US16577337

申请日：2019-09-20

Applicant: NEC Laboratories America, Inc.

Inventor： Renqiang Min , Bing Bai , Yogesh Balaji

IPC: G06K9/00 , G06K9/62 , G06N3/08 , G06N3/04 , G06F40/279

Abstract: Systems and methods for processing video are provided. The method includes receiving a text-based description of active scenes and representing the text-based description as a word embedding matrix. The method includes using a text encoder implemented by neural network to output frame level textual representation and video level representation of the word embedding matrix. The method also includes generating, by a shared generator, frame by frame video based on the frame level textual representation, the video level representation and noise vectors. A frame level and a video level convolutional filter of a video discriminator are generated to classify frames and video of the frame by frame video as true or false. The method also includes training a conditional video generator that includes the text encoder, the video discriminator, and the shared generator in a generative adversarial network to convergence.

2.

发明申请
MULTI-SCALE TEXT FILTER CONDITIONED GENERATIVE ADVERSARIAL NETWORKS 审中-公开

公开(公告)号：US20200097766A1

公开(公告)日：2020-03-26

申请号：US16577337

申请日：2019-09-20

Applicant: NEC Laboratories America, Inc.

Inventor： Renqiang Min , Bing Bai , Yogesh Balaji

IPC: G06K9/62 , G06N3/08 , G06N3/04 , G06K9/00 , G06F17/27

Abstract: Systems and methods for processing video are provided. The method includes receiving a text-based description of active scenes and representing the text-based description as a word embedding matrix. The method includes using a text encoder implemented by neural network to output frame level textual representation and video level representation of the word embedding matrix. The method also includes generating, by a shared generator, frame by frame video based on the frame level textual representation, the video level representation and noise vectors. A frame level and a video level convolutional filter of a video discriminator are generated to classify frames and video of the frame by frame video as true or false. The method also includes training a conditional video generator that includes the text encoder, the video discriminator, and the shared generator in a generative adversarial network to convergence.

Patent Agency Ranking