-
公开(公告)号:US20220189454A1
公开(公告)日:2022-06-16
申请号:US17117148
申请日:2020-12-10
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Dongyan Huang , Leyuan Sheng , Youjun Xiong
Abstract: A computer-implemented method for speech synthesis, a computer device, and a non-transitory computer readable storage medium are provided. The method includes: obtaining a speech text to be synthesized; obtaining a Mel spectrum corresponding to the speech text to be synthesized according to the speech text to be synthesized; inputting the Mel spectrum into a complex neural network, and obtaining a complex spectrum corresponding to the speech text to be synthesized, wherein the complex spectrum comprises real component information and imaginary component information; and obtaining a synthetic speech corresponding to the speech text to be synthesized, according to the complex spectrum. The method can efficiently and simply complete speech synthesis.
-
12.
公开(公告)号:US20210193113A1
公开(公告)日:2021-06-24
申请号:US17115729
申请日:2020-12-08
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Dongyan Huang , Leyuan Sheng , Youjun Xiong
IPC: G10L13/08 , G10L25/24 , G10L13/047
Abstract: The present disclosure provides a speech synthesis method as well as an apparatus and a computer readable storage medium using the same. The method includes: obtaining a to-be-synthesized text, and extracting to-be-processed Mel spectrum features of the to-be-synthesized text through a preset speech feature extraction algorithm; inputting the to-be-processed Mel spectrum features into a preset ResUnet network model to obtain first intermediate features; performing an average pooling and a first down sampling on the to-be-processed Mel spectrum features to obtain second intermediate features; taking the second intermediate features and the first intermediate features output by the ResUnet network model as an input to perform a deconvolution and a first up sampling so as to obtain target Mel spectrum features corresponding to the to-be-processed Mel spectrum features; and converting the target Mel spectrum features into a target speech corresponding to the to-be-synthesized text.
-
13.
公开(公告)号:US20240428493A1
公开(公告)日:2024-12-26
申请号:US18736552
申请日:2024-06-07
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: WAN DING , Dongyan Huang , Xianjie Yang , Zehong Zheng , Penghul Li
Abstract: A method for synthesizing a talking head video includes: obtaining speech data to be synthesized and observation data, wherein the observation data is data obtained through observation other than the speech data; performing feature extraction on the speech data to obtain speech features corresponding to the speech data, and performing feature extraction on the observation data to obtain non-speech features corresponding to the observation data; performing temporal modeling on the speech features and first non-speech features to obtain low-dimensional representations, wherein the first non-speech features are non-speech features that are sensitive to temporal changes; and performing video synthesis based on the low-dimensional representations and second non-speech features, wherein the second non-speech features are non-speech features insensitive to temporal changes.
-
公开(公告)号:US11941366B2
公开(公告)日:2024-03-26
申请号:US17102395
申请日:2020-11-23
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Chi Shao , Dongyan Huang , Wan Ding , Youjun Xiong
IPC: G06F40/35 , G06F40/284 , G06N3/049
CPC classification number: G06F40/35 , G06F40/284 , G06N3/049
Abstract: The present disclosure discloses a context-based multi-turn dialogue method. The method includes: obtaining to-be-matched historical dialogue information; performing a word feature extraction based on the to-be-matched historical dialogue information to obtain a historical dialogue word embedding; obtaining candidate answer information; performing the word feature extraction based on the candidate answer information to obtain a candidate answer word embedding; obtaining a historical dialogue partial matching vector and a candidate answer partial matching vector by performing partial semantic relationship matching based on the historical dialogue word embedding and the candidate answer word embedding; obtaining a candidate answer matching probability by performing a matching probability calculation based on the historical dialogue partial matching vector and the candidate answer partial matching vector; and determining matched answer information based on the candidate answer information and the candidate answer matching probability.
-
公开(公告)号:US11763796B2
公开(公告)日:2023-09-19
申请号:US17117148
申请日:2020-12-10
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Dongyan Huang , Leyuan Sheng , Youjun Xiong
CPC classification number: G10L13/02 , G06F17/14 , G06N3/08 , G06N20/10 , G10L21/0324 , G10L25/24 , G10L25/30
Abstract: A computer-implemented method for speech synthesis, a computer device, and a non-transitory computer readable storage medium are provided. The method includes: obtaining a speech text to be synthesized; obtaining a Mel spectrum corresponding to the speech text to be synthesized according to the speech text to be synthesized; inputting the Mel spectrum into a complex neural network, and obtaining a complex spectrum corresponding to the speech text to be synthesized, wherein the complex spectrum comprises real component information and imaginary component information; and obtaining a synthetic speech corresponding to the speech text to be synthesized, according to the complex spectrum. The method can efficiently and simply complete speech synthesis.
-
公开(公告)号:US11645474B2
公开(公告)日:2023-05-09
申请号:US17133673
申请日:2020-12-24
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Zhongfa Feng , Dongyan Huang , Youjun Xiong
Abstract: A computer-implemented method for text conversion, a computer device, and a non-transitory computer readable storage medium are provided. The method includes: obtaining a text to be converted; performing a non-standard word recognition on the text to be converted, to determine whether the text to be converted includes a non-standard word; recognizing the non-standard word in the text to be converted by using an eXtreme Gradient Boosting model in response to the text to be converted including the non-standard word; and obtaining a target converted text corresponding to the text to be converted, according to a recognition result outputted by the eXtreme Gradient Boosting model. The method has a faster recognition speed and a higher recognition accuracy compared with the deep learning model.
-
公开(公告)号:US20210201890A1
公开(公告)日:2021-07-01
申请号:US17095751
申请日:2020-11-12
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Ruotong Wang , Dongyan Huang , Xian Li , Jiebin Xie , Zhichao Tang , Wan Ding , Yang Liu , Bai Li , Youjun Xiong
Abstract: The present disclosure discloses a voice conversion training method. The method includes: forming a first training data set including a plurality of training voice data groups; selecting two of the training voice data groups from the first training data set to input into a voice conversion neural network for training; forming a second training data set including the first training data set and a first source speaker voice data group; inputting one of the training voice data groups selected from the first training data set and the first source speaker voice data group into the network for training; forming the third training data set including the second source speaker voice data group and the personalized voice data group that are parallel corpus with respect to each other; and inputting the second source speaker voice data group and the personalized voice data group into the network for training.
-
公开(公告)号:US20210200962A1
公开(公告)日:2021-07-01
申请号:US17133673
申请日:2020-12-24
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Zhongfa Feng , Dongyan Huang , Youjun Xiong
IPC: G06F40/40
Abstract: A computer-implemented method for text conversion, a computer device, and a non-transitory computer readable storage medium are provided. The method includes: obtaining a text to be converted; performing a non-standard word recognition on the text to be converted, to determine whether the text to be converted includes a non-standard word; recognizing the non-standard word in the text to be converted by using an eXtreme Gradient Boosting model in response to the text to be converted including the non-standard word; and obtaining a target converted text corresponding to the text to be converted, according to a recognition result outputted by the eXtreme Gradient Boosting model. The method has a faster recognition speed and a higher recognition accuracy compared with the deep learning model.
-
-
-
-
-
-
-