-
公开(公告)号:US20240013462A1
公开(公告)日:2024-01-11
申请号:US17859615
申请日:2022-07-07
Applicant: Nvidia Corporation
Inventor: Yeongho Seol , Simon Yuen , Dmitry Aleksandrovich Korobchenko , Mingquan Zhou , Ronan Browne , Wonmin Byeon
CPC classification number: G06T13/205 , G06T13/40 , G06T17/20 , G10L25/63 , G10L15/16
Abstract: A deep neural network can be trained to output motion or deformation information for a character that is representative of the character uttering speech contained in audio input, which is accurate for an emotional state of the character. The character can have different facial components or regions (e.g., head, skin, eyes, tongue) modeled separately, such that the network can output motion or deformation information for each of these different facial components. During training, the network can be provided with emotion and/or style vectors that indicate information to be used in generating realistic animation for input speech, as may relate to one or more emotions to be exhibited by the character, a relative weighting of those emotions, and any style or adjustments to be made to how the character expresses that emotional state. The network output can be provided to a renderer to generate audio-driven facial animation that is emotion-accurate.
-
公开(公告)号:US20230035306A1
公开(公告)日:2023-02-02
申请号:US17382027
申请日:2021-07-21
Applicant: Nvidia Corporation
Inventor: Ming-Yu Liu , Koki Nagano , Yeongho Seol , Jose Rafael Valle Gomes da Costa , Jaewoo Seo , Ting-Chun Wang , Arun Mallya , Sameh Khamis , Wei Ping , Rohan Badlani , Kevin Jonathan Shih , Bryan Catanzaro , Simon Yuen , Jan Kautz
Abstract: Apparatuses, systems, and techniques are presented to generate media content. In at least one embodiment, a first neural network is used to generate first video information based, at least in part, upon voice information corresponding to one or more users, and a second neural network is used to generate second video information corresponding to the one or more users based, at least in part, upon the first video information and one or more images corresponding to the one or more users
-
公开(公告)号:US11734890B2
公开(公告)日:2023-08-22
申请号:US17175792
申请日:2021-02-15
Applicant: NVIDIA Corporation
Inventor: Samuli Matias Laine , Janne Johannes Hellsten , Tero Tapani Karras , Yeongho Seol , Jaakko T. Lehtinen , Timo Oskari Aila
CPC classification number: G06T17/205 , G06T7/97 , G06T15/04 , G06T15/50 , G06T15/503 , G06T19/20 , G06N3/04 , G06N3/08 , G06T2207/20081 , G06T2207/20084 , G06T2219/2012
Abstract: A three-dimensional (3D) model of an object is recovered from two-dimensional (2D) images of the object. Each image in the set of 2D images includes the object captured from a different camera position and deformations of a base mesh that defines the 3D model may be computed corresponding to each image. The 3D model may also include a texture map that represents the lighting and material properties of the 3D model. Recovery of the 3D model relies on analytic antialiasing to provide a link between pixel colors in the 2D images and geometry of the 3D model. A modular differentiable renderer design yields high performance by leveraging existing, highly optimized hardware graphics pipelines to reconstruct the 3D model. The differential renderer renders images of the 3D model and differences between the rendered images and reference images are propagated backwards through the rendering pipeline to iteratively adjust the 3D model.
-
公开(公告)号:US20250061634A1
公开(公告)日:2025-02-20
申请号:US18457251
申请日:2023-08-28
Applicant: Nvidia Corporation
Inventor: Zhengyu Huang , Rui Zhang , Tao Li , Yingying Zhong , Weihua Zhang , Junjie Lai , Yeongho Seol , Dmitry Korobchenko , Simon Yuen
Abstract: Systems and methods of the present disclosure include animating virtual avatars or agents according to input audio and one or more selected or determined emotions and/or styles. For example, a deep neural network can be trained to output motion or deformation information for a character that is representative of the character uttering speech contained in audio input. The character can have different facial components or regions (e.g., head, skin, eyes, tongue) modeled separately, such that the network can output motion or deformation information for each of these different facial components. During training, the network can use a transformer-based audio encoder with locked parameters to train an associated decoder using a weighted feature vector. The network output can be provided to a renderer to generate audio-driven facial animation that is emotion-accurate.
-
公开(公告)号:US20240412440A1
公开(公告)日:2024-12-12
申请号:US18329831
申请日:2023-06-06
Applicant: NVIDIA Corporation
Inventor: Rui Zhang , Zhengyu Huang , Lance Li , Weihua Zhang , Yingying Zhong , Junjie Lai , Yeongho Seol , Dmitry Korobchenko
Abstract: In various examples, techniques are described for animating characters by decoupling portions of a face from other portions of the face. Systems and methods are disclosed that use one or more neural networks to generate high-fidelity facial animation using inputted audio data. In order to generate the high-fidelity facial animations, the systems and methods may decouple effects of implicit emotional states from effects of audio on the facial animations during training of the neural network(s). For instance, the training may cause the audio to drive the lower face animations while the implicit emotional states drive the upper face animations. In some examples, in order to encourage more expressive expressions, adversarial training is further used to learn a discriminator that predicts if generated emotional states are from real distribution.
-
-
-
-