Patent search ap:("Samsung Electronics Co. Page Ltd.") AND inv:"Liang Zhao"

1.

发明公开
MACHINE LEARNING-BASED APPROACH FOR AUDIO-DRIVEN AVATAR ANIMATION OR OTHER FUNCTIONS 审中-公开

公开(公告)号：US20240203014A1

公开(公告)日：2024-06-20

申请号：US18299248

申请日：2023-04-12

Applicant: Samsung Electronics Co., Ltd.

Inventor： Liang Zhao , Siva Penke

IPC: G06T13/20 , G06T13/40 , G10L17/02 , G10L17/04 , G10L17/18

CPC classification number: G06T13/205 , G06T13/40 , G10L17/02 , G10L17/04 , G10L17/18

Abstract: A method includes obtaining, using at least one processing device of an electronic device, an audio input associated with a speaker. The method also includes extracting, using a feature extractor of a trained machine learning model, audio features from the audio input. The method further includes generating (i) one or more content parameter predictions using content embeddings extracted by a content encoder and decoded by a content decoder of the trained machine learning model and (ii) one or more style parameter predictions using style embeddings extracted by a style encoder and decoded by a style decoder of the trained machine learning model. The content embeddings and the style embeddings are based on the audio features of the audio input. The trained machine learning model is trained to generate the one or more content parameter predictions and the one or more style parameter predictions using disentangled content and style embeddings.

2.

发明授权
Light-weight machine learning models for lip sync animation on mobile devices or other devices 有权

公开(公告)号：US12154204B2

公开(公告)日：2024-11-26

申请号：US17673645

申请日：2022-02-16

Applicant: Samsung Electronics Co., Ltd.

Inventor： Liang Zhao , Siva Penke

IPC: G06T13/40 , G06T13/20 , G10L15/02 , G10L15/04

Abstract: A method includes obtaining a speech segment. The method also includes generating, using at least one processing device of an electronic device, context-independent features and context-dependent features of the speech segment. The method further includes decoding, using the at least one processing device of the electronic device, a first viseme based on the context-independent features. The method also includes decoding, using the at least one processing device of the electronic device, a second viseme based on the context-dependent features and the first viseme. In addition, the method includes generating, using the at least one processing device of the electronic device, an output viseme based on the first and second visemes, where the output viseme is associated with a visual animation of the speech segment.

3.

发明申请
LIGHT-WEIGHT MACHINE LEARNING MODELS FOR LIP SYNC ANIMATION ON MOBILE DEVICES OR OTHER DEVICES 有权

公开(公告)号：US20230130287A1

公开(公告)日：2023-04-27

申请号：US17673645

申请日：2022-02-16

Applicant: Samsung Electronics Co., Ltd.

Inventor： Liang Zhao , Siva Penke

IPC: G06T13/20 , G10L15/04 , G10L15/02 , G06T13/40

Abstract: A method includes obtaining a speech segment. The method also includes generating, using at least one processing device of an electronic device, context-independent features and context-dependent features of the speech segment. The method further includes decoding, using the at least one processing device of the electronic device, a first viseme based on the context-independent features. The method also includes decoding, using the at least one processing device of the electronic device, a second viseme based on the context-dependent features and the first viseme. In addition, the method includes generating, using the at least one processing device of the electronic device, an output viseme based on the first and second visemes, where the output viseme is associated with a visual animation of the speech segment.

4.

发明申请
Real-Time Avatar Animation 有权

公开(公告)号：US20250104318A1

公开(公告)日：2025-03-27

申请号：US18601097

申请日：2024-03-11

Applicant: Samsung Electronics Co., Ltd.

Inventor： Liang Zhao , Siva Penke , Christopher Peri , Byeonghee Yu , Jisun Park

IPC: G06T13/20 , G06T13/40 , G10L15/02 , G10L15/06 , G10L21/028 , G10L25/63

Abstract: In one embodiment, a method includes accessing an audio input that includes a mixture of vocal sounds and non-vocal sounds and separating, by a trained audio source separation model, the audio input into a first audio output representing the vocal sounds and a second audio output representing the non-vocal sounds. The method further includes determining, by one or more trained avatar animation models and by separately encoding the first audio output representing the vocal sounds and the second audio output representing the non-vocal sounds, an avatar animation temporally corresponding to the audio input; and rendering, in real time and temporally coincident with the audio input, the determined avatar animation.

Patent Agency Ranking