VIDEO GENERATION WITH LATENT DIFFUSION MODELS

    公开(公告)号:US20240169479A1

    公开(公告)日:2024-05-23

    申请号:US18056444

    申请日:2022-11-17

    Applicant: Lemon Inc.

    CPC classification number: G06T3/4007 G06T3/4053

    Abstract: The present disclosure provides systems and methods for video generation using latent diffusion machine learning models. Given a text input, video data relevant to the text input can be generated using a latent diffusion model. The process includes generating a predetermined number of key frames using text-to-image generation tasks performed within a latent space via a variational auto-encoder, enabling faster training and sampling times compared to pixel space-based diffusion models. The process further includes utilizing two-dimensional convolutions and associated adaptors to learn features for a given frame. Temporal information for the frames can be learned via a directed temporal attention module used to capture the relation among frames and to generate a temporally meaningful sequence of frames. Additional frames can be generated via a frame interpolation process for inserting one or more transition frames between two generated frames. The process can also include a super-resolution process for upsampling the frames.

Patent Agency Ranking