Patent search ap:("UBTECH ROBOTICS CORP LTD") AND inv:"Zehong Zheng" Page 1

1.

发明申请
METHOD AND DEVICE FOR SYNTHESIZING TALKING HEAD VIDEO AND COMPUTER-READABLE STORAGE MEDIUM 有权

公开(公告)号：US20240428493A1

公开(公告)日：2024-12-26

申请号：US18736552

申请日：2024-06-07

Applicant: UBTECH ROBOTICS CORP LTD

Inventor： WAN DING , Dongyan Huang , Xianjie Yang , Zehong Zheng , Penghul Li

IPC: G06T13/40 , G06T7/73 , G06V10/44 , G06V40/16 , G06V40/20 , G10L15/02

Abstract: A method for synthesizing a talking head video includes: obtaining speech data to be synthesized and observation data, wherein the observation data is data obtained through observation other than the speech data; performing feature extraction on the speech data to obtain speech features corresponding to the speech data, and performing feature extraction on the observation data to obtain non-speech features corresponding to the observation data; performing temporal modeling on the speech features and first non-speech features to obtain low-dimensional representations, wherein the first non-speech features are non-speech features that are sensitive to temporal changes; and performing video synthesis based on the low-dimensional representations and second non-speech features, wherein the second non-speech features are non-speech features insensitive to temporal changes.

2.

发明公开
TEXT-TO-SPEECH SYNTHESIS METHOD, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM 审中-公开

公开(公告)号：US20230410791A1

公开(公告)日：2023-12-21

申请号：US18212140

申请日：2023-06-20

Applicant: UBTECH ROBOTICS CORP LTD

Inventor： Wan Ding , Dongyuan Huang , Zehong Zheng , Linhuang Yan , Zhiyong Yang

IPC: G10L13/10 , G10L13/04

CPC classification number: G10L13/10 , G10L13/04

Abstract: A text-to-speech synthesis method, an electronic device, and a computer-readable storage medium are provided. The method includes: obtaining prosodic pause features of an input text by performing a prosodic pause prediction processing on the input text, and dividing the input text into a plurality of prosodic phrases according to the prosodic pause features; synthesizing short sentence audios according to the prosodic phrases by performing a streamed speech synthesis processing on each of the prosodic phrases in the input text in a manner of asynchronous processing of a thread pool; and performing an audio playback operation of the input text according to the short sentence audios corresponding to the first prosodic phrase of the input text, in response to synthesizing the short sentence audio corresponding to the first prosodic phrase of the input text.

Patent Agency Ranking