-
公开(公告)号:US20240233229A1
公开(公告)日:2024-07-11
申请号:US18007867
申请日:2021-11-08
Applicant: NVIDIA Corporation
Inventor: Evgeny Aleksandrovich Tumanov , Dmitry Aleksandrovich Korobchenko , Simon Yuen , Kevin Margo
CPC classification number: G06T13/205 , G06T13/40
Abstract: In various examples, animations may be generated using audio-driven body animation synthesized with voice tempo. For example, full body animation may be driven from an audio input representative of recorded speech, where voice tempo (e.g., a number of phonemes per unit time) may be used to generate a 1D audio signal for comparing to datasets including data samples that each include an animation and a corresponding 1D audio signal. One or more loss functions may be used to compare the 1D audio signal from the input audio to the audio signals of the datasets, as well as to compare joint information of joints of an actor between animations of two or more data samples, in order to identify optimal transition points between the animations. The animations may then be stitched together—e.g., using interpolation and/or a neural network trained to seamlessly stitch sequences together—using the transition points.
-
公开(公告)号:US20240013802A1
公开(公告)日:2024-01-11
申请号:US17859660
申请日:2022-07-07
Applicant: Nvidia Corporation
Inventor: Ilia Federov , Dmitry Aleksandrovich Korobchenko
Abstract: A deep neural network can be trained to infer emotion data from input audio. The network can be a transformer-based network that can infer probability values for a set of emotions or emotion classes. The emotion probability values can be modified using one or more heuristics, such as to provide for smoothing of emotion determinations over time, or via a user interface, where a user can modify emotion determinations as appropriate. A user may also provide prior emotion values to be blended with these emotion determination values. Determined emotion values can be provided as input to an emotion-based operation, such as to provide audio-driven speech animation.
-
公开(公告)号:US20240013462A1
公开(公告)日:2024-01-11
申请号:US17859615
申请日:2022-07-07
Applicant: Nvidia Corporation
Inventor: Yeongho Seol , Simon Yuen , Dmitry Aleksandrovich Korobchenko , Mingquan Zhou , Ronan Browne , Wonmin Byeon
CPC classification number: G06T13/205 , G06T13/40 , G06T17/20 , G10L25/63 , G10L15/16
Abstract: A deep neural network can be trained to output motion or deformation information for a character that is representative of the character uttering speech contained in audio input, which is accurate for an emotional state of the character. The character can have different facial components or regions (e.g., head, skin, eyes, tongue) modeled separately, such that the network can output motion or deformation information for each of these different facial components. During training, the network can be provided with emotion and/or style vectors that indicate information to be used in generating realistic animation for input speech, as may relate to one or more emotions to be exhibited by the character, a relative weighting of those emotions, and any style or adjustments to be made to how the character expresses that emotional state. The network output can be provided to a renderer to generate audio-driven facial animation that is emotion-accurate.
-
-