-
公开(公告)号:US20240169479A1
公开(公告)日:2024-05-23
申请号:US18056444
申请日:2022-11-17
Applicant: Lemon Inc.
Inventor: Wei Min Wang , Daquan Zhou , Jiashi Feng
IPC: G06T3/40
CPC classification number: G06T3/4007 , G06T3/4053
Abstract: The present disclosure provides systems and methods for video generation using latent diffusion machine learning models. Given a text input, video data relevant to the text input can be generated using a latent diffusion model. The process includes generating a predetermined number of key frames using text-to-image generation tasks performed within a latent space via a variational auto-encoder, enabling faster training and sampling times compared to pixel space-based diffusion models. The process further includes utilizing two-dimensional convolutions and associated adaptors to learn features for a given frame. Temporal information for the frames can be learned via a directed temporal attention module used to capture the relation among frames and to generate a temporally meaningful sequence of frames. Additional frames can be generated via a frame interpolation process for inserting one or more transition frames between two generated frames. The process can also include a super-resolution process for upsampling the frames.