Patent search ap:("NVIDIA Corporation") AND inv:"Dmitry Aleksandrovich Korobchenko" Page 1

1.

发明公开
SYNTHETIC AUDIO-DRIVEN BODY ANIMATION USING VOICE TEMPO 审中-公开

公开(公告)号：US20240233229A1

公开(公告)日：2024-07-11

申请号：US18007867

申请日：2021-11-08

Applicant: NVIDIA Corporation

Inventor： Evgeny Aleksandrovich Tumanov , Dmitry Aleksandrovich Korobchenko , Simon Yuen , Kevin Margo

IPC: G06T13/20 , G06T13/40

CPC classification number: G06T13/205 , G06T13/40

Abstract: In various examples, animations may be generated using audio-driven body animation synthesized with voice tempo. For example, full body animation may be driven from an audio input representative of recorded speech, where voice tempo (e.g., a number of phonemes per unit time) may be used to generate a 1D audio signal for comparing to datasets including data samples that each include an animation and a corresponding 1D audio signal. One or more loss functions may be used to compare the 1D audio signal from the input audio to the audio signals of the datasets, as well as to compare joint information of joints of an actor between animations of two or more data samples, in order to identify optimal transition points between the animations. The animations may then be stitched together—e.g., using interpolation and/or a neural network trained to seamlessly stitch sequences together—using the transition points.

2.

发明公开
INFERRING EMOTION FROM SPEECH IN AUDIO DATA USING DEEP LEARNING 审中-公开

公开(公告)号：US20240013802A1

公开(公告)日：2024-01-11

申请号：US17859660

申请日：2022-07-07

Applicant: Nvidia Corporation

Inventor： Ilia Federov , Dmitry Aleksandrovich Korobchenko

IPC: G10L25/63 , G10L25/30

CPC classification number: G10L25/63 , G10L25/30

Abstract: A deep neural network can be trained to infer emotion data from input audio. The network can be a transformer-based network that can infer probability values for a set of emotions or emotion classes. The emotion probability values can be modified using one or more heuristics, such as to provide for smoothing of emotion determinations over time, or via a user interface, where a user can modify emotion determinations as appropriate. A user may also provide prior emotion values to be blended with these emotion determination values. Determined emotion values can be provided as input to an emotion-based operation, such as to provide audio-driven speech animation.

3.

发明公开
AUDIO-DRIVEN FACIAL ANIMATION WITH EMOTION SUPPORT USING MACHINE LEARNING 审中-公开

公开(公告)号：US20240013462A1

公开(公告)日：2024-01-11

申请号：US17859615

申请日：2022-07-07

Applicant: Nvidia Corporation

Inventor： Yeongho Seol , Simon Yuen , Dmitry Aleksandrovich Korobchenko , Mingquan Zhou , Ronan Browne , Wonmin Byeon

IPC: G06T13/20 , G06T13/40 , G06T17/20 , G10L25/63 , G10L15/16

CPC classification number: G06T13/205 , G06T13/40 , G06T17/20 , G10L25/63 , G10L15/16

Abstract: A deep neural network can be trained to output motion or deformation information for a character that is representative of the character uttering speech contained in audio input, which is accurate for an emotional state of the character. The character can have different facial components or regions (e.g., head, skin, eyes, tongue) modeled separately, such that the network can output motion or deformation information for each of these different facial components. During training, the network can be provided with emotion and/or style vectors that indicate information to be used in generating realistic animation for input speech, as may relate to one or more emotions to be exhibited by the character, a relative weighting of those emotions, and any style or adjustments to be made to how the character expresses that emotional state. The network output can be provided to a renderer to generate audio-driven facial animation that is emotion-accurate.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification