-
公开(公告)号:US20250061634A1
公开(公告)日:2025-02-20
申请号:US18457251
申请日:2023-08-28
Applicant: Nvidia Corporation
Inventor: Zhengyu Huang , Rui Zhang , Tao Li , Yingying Zhong , Weihua Zhang , Junjie Lai , Yeongho Seol , Dmitry Korobchenko , Simon Yuen
Abstract: Systems and methods of the present disclosure include animating virtual avatars or agents according to input audio and one or more selected or determined emotions and/or styles. For example, a deep neural network can be trained to output motion or deformation information for a character that is representative of the character uttering speech contained in audio input. The character can have different facial components or regions (e.g., head, skin, eyes, tongue) modeled separately, such that the network can output motion or deformation information for each of these different facial components. During training, the network can use a transformer-based audio encoder with locked parameters to train an associated decoder using a weighted feature vector. The network output can be provided to a renderer to generate audio-driven facial animation that is emotion-accurate.
-
公开(公告)号:US20240412440A1
公开(公告)日:2024-12-12
申请号:US18329831
申请日:2023-06-06
Applicant: NVIDIA Corporation
Inventor: Rui Zhang , Zhengyu Huang , Lance Li , Weihua Zhang , Yingying Zhong , Junjie Lai , Yeongho Seol , Dmitry Korobchenko
Abstract: In various examples, techniques are described for animating characters by decoupling portions of a face from other portions of the face. Systems and methods are disclosed that use one or more neural networks to generate high-fidelity facial animation using inputted audio data. In order to generate the high-fidelity facial animations, the systems and methods may decouple effects of implicit emotional states from effects of audio on the facial animations during training of the neural network(s). For instance, the training may cause the audio to drive the lower face animations while the implicit emotional states drive the upper face animations. In some examples, in order to encourage more expressive expressions, adversarial training is further used to learn a discriminator that predicts if generated emotional states are from real distribution.
-