-
公开(公告)号:US20250111552A1
公开(公告)日:2025-04-03
申请号:US18819064
申请日:2024-08-29
Applicant: NVIDIA Corporation
Inventor: Sihyun Yu , Weili Nie , De-An Huang , Boyi Li , Animashree Anandkumar
IPC: G06T11/00 , G06N3/0455 , G06T9/00
Abstract: Systems and methods are disclosed that train a content frame-motion latent diffusion model (CDM) and use the CDM to generate requested videos. The CMD may be a two-stage framework that first compresses videos to a succinct latent space and then learns the video distribution in this latent space. For instance, the CMD may include an autoencoder and two diffusion models. In a first stage, using the autoencoder, a low-dimensional latent decomposition into a content frame and latent motion representation is learned. In the second stage, without adding any new parameters, the content frame distribution may be fine-tuned by using a pretrained image diffusion model, which allows the CMD to leverage the rich visual knowledge in pretrained image diffusion models. In addition, a new lightweight diffusion model may be used to generate motion latent representations that are conditioned on the given content frame.