-
1.
公开(公告)号:US11417316B2
公开(公告)日:2022-08-16
申请号:US17115729
申请日:2020-12-08
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Dongyan Huang , Leyuan Sheng , Youjun Xiong
IPC: G10L13/08 , G10L13/047 , G10L25/24
Abstract: The present disclosure provides a speech synthesis method as well as an apparatus and a computer readable storage medium using the same. The method includes: obtaining a to-be-synthesized text, and extracting to-be-processed Mel spectrum features of the to-be-synthesized text through a preset speech feature extraction algorithm; inputting the to-be-processed Mel spectrum features into a preset ResUnet network model to obtain first intermediate features; performing an average pooling and a first down sampling on the to-be-processed Mel spectrum features to obtain second intermediate features; taking the second intermediate features and the first intermediate features output by the ResUnet network model as an input to perform a deconvolution and a first up sampling so as to obtain target Mel spectrum features corresponding to the to-be-processed Mel spectrum features; and converting the target Mel spectrum features into a target speech corresponding to the to-be-synthesized text.
-
公开(公告)号:US11763796B2
公开(公告)日:2023-09-19
申请号:US17117148
申请日:2020-12-10
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Dongyan Huang , Leyuan Sheng , Youjun Xiong
CPC classification number: G10L13/02 , G06F17/14 , G06N3/08 , G06N20/10 , G10L21/0324 , G10L25/24 , G10L25/30
Abstract: A computer-implemented method for speech synthesis, a computer device, and a non-transitory computer readable storage medium are provided. The method includes: obtaining a speech text to be synthesized; obtaining a Mel spectrum corresponding to the speech text to be synthesized according to the speech text to be synthesized; inputting the Mel spectrum into a complex neural network, and obtaining a complex spectrum corresponding to the speech text to be synthesized, wherein the complex spectrum comprises real component information and imaginary component information; and obtaining a synthetic speech corresponding to the speech text to be synthesized, according to the complex spectrum. The method can efficiently and simply complete speech synthesis.
-
公开(公告)号:US20220189454A1
公开(公告)日:2022-06-16
申请号:US17117148
申请日:2020-12-10
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Dongyan Huang , Leyuan Sheng , Youjun Xiong
Abstract: A computer-implemented method for speech synthesis, a computer device, and a non-transitory computer readable storage medium are provided. The method includes: obtaining a speech text to be synthesized; obtaining a Mel spectrum corresponding to the speech text to be synthesized according to the speech text to be synthesized; inputting the Mel spectrum into a complex neural network, and obtaining a complex spectrum corresponding to the speech text to be synthesized, wherein the complex spectrum comprises real component information and imaginary component information; and obtaining a synthetic speech corresponding to the speech text to be synthesized, according to the complex spectrum. The method can efficiently and simply complete speech synthesis.
-
4.
公开(公告)号:US20210193113A1
公开(公告)日:2021-06-24
申请号:US17115729
申请日:2020-12-08
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Dongyan Huang , Leyuan Sheng , Youjun Xiong
IPC: G10L13/08 , G10L25/24 , G10L13/047
Abstract: The present disclosure provides a speech synthesis method as well as an apparatus and a computer readable storage medium using the same. The method includes: obtaining a to-be-synthesized text, and extracting to-be-processed Mel spectrum features of the to-be-synthesized text through a preset speech feature extraction algorithm; inputting the to-be-processed Mel spectrum features into a preset ResUnet network model to obtain first intermediate features; performing an average pooling and a first down sampling on the to-be-processed Mel spectrum features to obtain second intermediate features; taking the second intermediate features and the first intermediate features output by the ResUnet network model as an input to perform a deconvolution and a first up sampling so as to obtain target Mel spectrum features corresponding to the to-be-processed Mel spectrum features; and converting the target Mel spectrum features into a target speech corresponding to the to-be-synthesized text.
-
-
-