-
1.
公开(公告)号:US20240428493A1
公开(公告)日:2024-12-26
申请号:US18736552
申请日:2024-06-07
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: WAN DING , Dongyan Huang , Xianjie Yang , Zehong Zheng , Penghul Li
Abstract: A method for synthesizing a talking head video includes: obtaining speech data to be synthesized and observation data, wherein the observation data is data obtained through observation other than the speech data; performing feature extraction on the speech data to obtain speech features corresponding to the speech data, and performing feature extraction on the observation data to obtain non-speech features corresponding to the observation data; performing temporal modeling on the speech features and first non-speech features to obtain low-dimensional representations, wherein the first non-speech features are non-speech features that are sensitive to temporal changes; and performing video synthesis based on the low-dimensional representations and second non-speech features, wherein the second non-speech features are non-speech features insensitive to temporal changes.
-
公开(公告)号:US20230386116A1
公开(公告)日:2023-11-30
申请号:US18202291
申请日:2023-05-26
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: WAN DING , Dongyan Huang , Linhuang Yan , Zhiyong Yang
CPC classification number: G06T13/40 , G06T13/205 , G06V40/20 , G10L13/02
Abstract: A method for generating a talking head video includes: obtaining a text and an image containing a face of a user; determining a phoneme sequence that corresponds to the text and includes one or more phonemes; determining acoustic features corresponding to the text according to the phoneme sequence, and obtaining synthesized speech corresponding to the text according to the acoustic features; determining a first mouth movement sequence corresponding to the text according to the phoneme sequence, and determining a second mouth movement sequence corresponding to the text according to the acoustic features; creating a facial action video corresponding to the user according to the first mouth movement sequence, the second mouth movement sequence and the image; and processing the synthesized speech and the facial action video synchronously to obtain a talking head video corresponding to the user.
-