Multi-scale text filter conditioned generative adversarial networks

    公开(公告)号:US11170256B2

    公开(公告)日:2021-11-09

    申请号:US16577337

    申请日:2019-09-20

    Abstract: Systems and methods for processing video are provided. The method includes receiving a text-based description of active scenes and representing the text-based description as a word embedding matrix. The method includes using a text encoder implemented by neural network to output frame level textual representation and video level representation of the word embedding matrix. The method also includes generating, by a shared generator, frame by frame video based on the frame level textual representation, the video level representation and noise vectors. A frame level and a video level convolutional filter of a video discriminator are generated to classify frames and video of the frame by frame video as true or false. The method also includes training a conditional video generator that includes the text encoder, the video discriminator, and the shared generator in a generative adversarial network to convergence.

    MULTI-SCALE TEXT FILTER CONDITIONED GENERATIVE ADVERSARIAL NETWORKS

    公开(公告)号:US20200097766A1

    公开(公告)日:2020-03-26

    申请号:US16577337

    申请日:2019-09-20

    Abstract: Systems and methods for processing video are provided. The method includes receiving a text-based description of active scenes and representing the text-based description as a word embedding matrix. The method includes using a text encoder implemented by neural network to output frame level textual representation and video level representation of the word embedding matrix. The method also includes generating, by a shared generator, frame by frame video based on the frame level textual representation, the video level representation and noise vectors. A frame level and a video level convolutional filter of a video discriminator are generated to classify frames and video of the frame by frame video as true or false. The method also includes training a conditional video generator that includes the text encoder, the video discriminator, and the shared generator in a generative adversarial network to convergence.

Patent Agency Ranking