Patent search ap:("UBTECH ROBOTICS CORP LTD") AND inv:"WAN DING" Page 1

1.

发明申请
METHOD AND DEVICE FOR SYNTHESIZING TALKING HEAD VIDEO AND COMPUTER-READABLE STORAGE MEDIUM 有权

公开(公告)号：US20240428493A1

公开(公告)日：2024-12-26

申请号：US18736552

申请日：2024-06-07

Applicant: UBTECH ROBOTICS CORP LTD

Inventor： WAN DING , Dongyan Huang , Xianjie Yang , Zehong Zheng , Penghul Li

IPC: G06T13/40 , G06T7/73 , G06V10/44 , G06V40/16 , G06V40/20 , G10L15/02

Abstract: A method for synthesizing a talking head video includes: obtaining speech data to be synthesized and observation data, wherein the observation data is data obtained through observation other than the speech data; performing feature extraction on the speech data to obtain speech features corresponding to the speech data, and performing feature extraction on the observation data to obtain non-speech features corresponding to the observation data; performing temporal modeling on the speech features and first non-speech features to obtain low-dimensional representations, wherein the first non-speech features are non-speech features that are sensitive to temporal changes; and performing video synthesis based on the low-dimensional representations and second non-speech features, wherein the second non-speech features are non-speech features insensitive to temporal changes.

2.

发明公开
METHOD FOR GENERATING TALKING HEAD VIDEO, DEVICE AND COMPUTER-READABLE STORAGE MEDIUM 审中-公开

公开(公告)号：US20230386116A1

公开(公告)日：2023-11-30

申请号：US18202291

申请日：2023-05-26

Applicant: UBTECH ROBOTICS CORP LTD

Inventor： WAN DING , Dongyan Huang , Linhuang Yan , Zhiyong Yang

IPC: G06T13/40 , G06T13/20 , G06V40/20 , G10L13/02

CPC classification number: G06T13/40 , G06T13/205 , G06V40/20 , G10L13/02

Abstract: A method for generating a talking head video includes: obtaining a text and an image containing a face of a user; determining a phoneme sequence that corresponds to the text and includes one or more phonemes; determining acoustic features corresponding to the text according to the phoneme sequence, and obtaining synthesized speech corresponding to the text according to the acoustic features; determining a first mouth movement sequence corresponding to the text according to the phoneme sequence, and determining a second mouth movement sequence corresponding to the text according to the acoustic features; creating a facial action video corresponding to the user according to the first mouth movement sequence, the second mouth movement sequence and the image; and processing the synthesized speech and the facial action video synchronously to obtain a talking head video corresponding to the user.

Patent Agency Ranking