MACHINE LEARNING-BASED APPROACH FOR AUDIO-DRIVEN AVATAR ANIMATION OR OTHER FUNCTIONS

    公开(公告)号:US20240203014A1

    公开(公告)日:2024-06-20

    申请号:US18299248

    申请日:2023-04-12

    CPC classification number: G06T13/205 G06T13/40 G10L17/02 G10L17/04 G10L17/18

    Abstract: A method includes obtaining, using at least one processing device of an electronic device, an audio input associated with a speaker. The method also includes extracting, using a feature extractor of a trained machine learning model, audio features from the audio input. The method further includes generating (i) one or more content parameter predictions using content embeddings extracted by a content encoder and decoded by a content decoder of the trained machine learning model and (ii) one or more style parameter predictions using style embeddings extracted by a style encoder and decoded by a style decoder of the trained machine learning model. The content embeddings and the style embeddings are based on the audio features of the audio input. The trained machine learning model is trained to generate the one or more content parameter predictions and the one or more style parameter predictions using disentangled content and style embeddings.

    Light-weight machine learning models for lip sync animation on mobile devices or other devices

    公开(公告)号:US12154204B2

    公开(公告)日:2024-11-26

    申请号:US17673645

    申请日:2022-02-16

    Abstract: A method includes obtaining a speech segment. The method also includes generating, using at least one processing device of an electronic device, context-independent features and context-dependent features of the speech segment. The method further includes decoding, using the at least one processing device of the electronic device, a first viseme based on the context-independent features. The method also includes decoding, using the at least one processing device of the electronic device, a second viseme based on the context-dependent features and the first viseme. In addition, the method includes generating, using the at least one processing device of the electronic device, an output viseme based on the first and second visemes, where the output viseme is associated with a visual animation of the speech segment.

    LIGHT-WEIGHT MACHINE LEARNING MODELS FOR LIP SYNC ANIMATION ON MOBILE DEVICES OR OTHER DEVICES

    公开(公告)号:US20230130287A1

    公开(公告)日:2023-04-27

    申请号:US17673645

    申请日:2022-02-16

    Abstract: A method includes obtaining a speech segment. The method also includes generating, using at least one processing device of an electronic device, context-independent features and context-dependent features of the speech segment. The method further includes decoding, using the at least one processing device of the electronic device, a first viseme based on the context-independent features. The method also includes decoding, using the at least one processing device of the electronic device, a second viseme based on the context-dependent features and the first viseme. In addition, the method includes generating, using the at least one processing device of the electronic device, an output viseme based on the first and second visemes, where the output viseme is associated with a visual animation of the speech segment.

    Real-Time Avatar Animation
    4.
    发明申请

    公开(公告)号:US20250104318A1

    公开(公告)日:2025-03-27

    申请号:US18601097

    申请日:2024-03-11

    Abstract: In one embodiment, a method includes accessing an audio input that includes a mixture of vocal sounds and non-vocal sounds and separating, by a trained audio source separation model, the audio input into a first audio output representing the vocal sounds and a second audio output representing the non-vocal sounds. The method further includes determining, by one or more trained avatar animation models and by separately encoding the first audio output representing the vocal sounds and the second audio output representing the non-vocal sounds, an avatar animation temporally corresponding to the audio input; and rendering, in real time and temporally coincident with the audio input, the determined avatar animation.

Patent Agency Ranking