-
1.
公开(公告)号:US20230410791A1
公开(公告)日:2023-12-21
申请号:US18212140
申请日:2023-06-20
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Wan Ding , Dongyuan Huang , Zehong Zheng , Linhuang Yan , Zhiyong Yang
Abstract: A text-to-speech synthesis method, an electronic device, and a computer-readable storage medium are provided. The method includes: obtaining prosodic pause features of an input text by performing a prosodic pause prediction processing on the input text, and dividing the input text into a plurality of prosodic phrases according to the prosodic pause features; synthesizing short sentence audios according to the prosodic phrases by performing a streamed speech synthesis processing on each of the prosodic phrases in the input text in a manner of asynchronous processing of a thread pool; and performing an audio playback operation of the input text according to the short sentence audios corresponding to the first prosodic phrase of the input text, in response to synthesizing the short sentence audio corresponding to the first prosodic phrase of the input text.
-
公开(公告)号:US20230386116A1
公开(公告)日:2023-11-30
申请号:US18202291
申请日:2023-05-26
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: WAN DING , Dongyan Huang , Linhuang Yan , Zhiyong Yang
CPC classification number: G06T13/40 , G06T13/205 , G06V40/20 , G10L13/02
Abstract: A method for generating a talking head video includes: obtaining a text and an image containing a face of a user; determining a phoneme sequence that corresponds to the text and includes one or more phonemes; determining acoustic features corresponding to the text according to the phoneme sequence, and obtaining synthesized speech corresponding to the text according to the acoustic features; determining a first mouth movement sequence corresponding to the text according to the phoneme sequence, and determining a second mouth movement sequence corresponding to the text according to the acoustic features; creating a facial action video corresponding to the user according to the first mouth movement sequence, the second mouth movement sequence and the image; and processing the synthesized speech and the facial action video synchronously to obtain a talking head video corresponding to the user.
-
公开(公告)号:US20230206895A1
公开(公告)日:2023-06-29
申请号:US18089576
申请日:2022-12-28
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Wan Ding , Dongyan Huang , Zhiyuan Zhao , Zhiyong Yang
IPC: G10L13/047 , G10L13/10
CPC classification number: G10L13/047 , G10L13/10
Abstract: A speech synthesis method includes: obtaining an acoustic feature sequence of a text to be processed; processing the acoustic feature sequence by using a non-autoregressive computing model in parallel to obtain first audio information of the text, to be processed, wherein the first audio information comprises audio corresponding to each segment; processing the acoustic feature sequence and the first audio information by using an autoregressive computing model to obtain a residual value corresponding to each segment; and obtaining second audio information corresponding to an i-th segment based on the first audio information corresponding to the i-th segment and the residual values corresponding to a first to an (i-1)-th segment, wherein a synthesized audio of the text to be processed comprises each of the second audio information, i=1, 2 . . . n, n is a total number of the segments.
-
-