-
公开(公告)号:US20210200961A1
公开(公告)日:2021-07-01
申请号:US17102395
申请日:2020-11-23
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Chi Shao , Dongyan Huang , Wan Ding , Youjun Xiong
IPC: G06F40/35 , G06F40/284 , G06N3/04
Abstract: The present disclosure discloses a context-based multi-turn dialogue method. The method includes: obtaining to-be-matched historical dialogue information; performing a word feature extraction based on the to-be-matched historical dialogue information to obtain a historical dialogue word embedding; obtaining candidate answer information; performing the word feature extraction based on the candidate answer information to obtain a candidate answer word embedding; obtaining a historical dialogue partial matching vector and a candidate answer partial matching vector by performing partial semantic relationship matching based on the historical dialogue word embedding and the candidate answer word embedding; obtaining a candidate answer matching probability by performing a matching probability calculation based on the historical dialogue partial matching vector and the candidate answer partial matching vector; and determining matched answer information based on the candidate answer information and the candidate answer matching probability.
-
公开(公告)号:US20250133337A1
公开(公告)日:2025-04-24
申请号:US18911197
申请日:2024-10-09
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: ZEHONG ZHENG , Dongyan Huang , Xianjie Yang , Wan Ding
IPC: H04R1/40
Abstract: A sound source localization method includes: obtaining a first audio frame and at least two second audio frames, wherein the first audio frame and the at least two second audio frames are synchronously sampled, the first audio frame is obtained by processing sound signals collected by the first microphone, the at least two second audio frames are obtained by processing sound signals collected by the second microphones; calculating a time delay estimation between the first audio frame and each of the at least two second audio frames; and determining a sound source orientation corresponding to the first audio frame and the at least two second audio frames through a preset time delay-orientation lookup table according to the time delay estimation between the first audio frame and each of the at least two second audio frames.
-
公开(公告)号:US11282503B2
公开(公告)日:2022-03-22
申请号:US17095751
申请日:2020-11-12
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Ruotong Wang , Dongyan Huang , Xian Li , Jiebin Xie , Zhichao Tang , Wan Ding , Yang Liu , Bai Li , Youjun Xiong
Abstract: The present disclosure discloses a voice conversion training method. The method includes: forming a first training data set including a plurality of training voice data groups; selecting two of the training voice data groups from the first training data set to input into a voice conversion neural network for training; forming a second training data set including the first training data set and a first source speaker voice data group; inputting one of the training voice data groups selected from the first training data set and the first source speaker voice data group into the network for training; forming the third training data set including the second source speaker voice data group and the personalized voice data group that are parallel corpus with respect to each other; and inputting the second source speaker voice data group and the personalized voice data group into the network for training.
-
4.
公开(公告)号:US20230410791A1
公开(公告)日:2023-12-21
申请号:US18212140
申请日:2023-06-20
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Wan Ding , Dongyuan Huang , Zehong Zheng , Linhuang Yan , Zhiyong Yang
Abstract: A text-to-speech synthesis method, an electronic device, and a computer-readable storage medium are provided. The method includes: obtaining prosodic pause features of an input text by performing a prosodic pause prediction processing on the input text, and dividing the input text into a plurality of prosodic phrases according to the prosodic pause features; synthesizing short sentence audios according to the prosodic phrases by performing a streamed speech synthesis processing on each of the prosodic phrases in the input text in a manner of asynchronous processing of a thread pool; and performing an audio playback operation of the input text according to the short sentence audios corresponding to the first prosodic phrase of the input text, in response to synthesizing the short sentence audio corresponding to the first prosodic phrase of the input text.
-
公开(公告)号:US20230206895A1
公开(公告)日:2023-06-29
申请号:US18089576
申请日:2022-12-28
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Wan Ding , Dongyan Huang , Zhiyuan Zhao , Zhiyong Yang
IPC: G10L13/047 , G10L13/10
CPC classification number: G10L13/047 , G10L13/10
Abstract: A speech synthesis method includes: obtaining an acoustic feature sequence of a text to be processed; processing the acoustic feature sequence by using a non-autoregressive computing model in parallel to obtain first audio information of the text, to be processed, wherein the first audio information comprises audio corresponding to each segment; processing the acoustic feature sequence and the first audio information by using an autoregressive computing model to obtain a residual value corresponding to each segment; and obtaining second audio information corresponding to an i-th segment based on the first audio information corresponding to the i-th segment and the residual values corresponding to a first to an (i-1)-th segment, wherein a synthesized audio of the text to be processed comprises each of the second audio information, i=1, 2 . . . n, n is a total number of the segments.
-
公开(公告)号:US11941366B2
公开(公告)日:2024-03-26
申请号:US17102395
申请日:2020-11-23
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Chi Shao , Dongyan Huang , Wan Ding , Youjun Xiong
IPC: G06F40/35 , G06F40/284 , G06N3/049
CPC classification number: G06F40/35 , G06F40/284 , G06N3/049
Abstract: The present disclosure discloses a context-based multi-turn dialogue method. The method includes: obtaining to-be-matched historical dialogue information; performing a word feature extraction based on the to-be-matched historical dialogue information to obtain a historical dialogue word embedding; obtaining candidate answer information; performing the word feature extraction based on the candidate answer information to obtain a candidate answer word embedding; obtaining a historical dialogue partial matching vector and a candidate answer partial matching vector by performing partial semantic relationship matching based on the historical dialogue word embedding and the candidate answer word embedding; obtaining a candidate answer matching probability by performing a matching probability calculation based on the historical dialogue partial matching vector and the candidate answer partial matching vector; and determining matched answer information based on the candidate answer information and the candidate answer matching probability.
-
公开(公告)号:US20210201890A1
公开(公告)日:2021-07-01
申请号:US17095751
申请日:2020-11-12
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Ruotong Wang , Dongyan Huang , Xian Li , Jiebin Xie , Zhichao Tang , Wan Ding , Yang Liu , Bai Li , Youjun Xiong
Abstract: The present disclosure discloses a voice conversion training method. The method includes: forming a first training data set including a plurality of training voice data groups; selecting two of the training voice data groups from the first training data set to input into a voice conversion neural network for training; forming a second training data set including the first training data set and a first source speaker voice data group; inputting one of the training voice data groups selected from the first training data set and the first source speaker voice data group into the network for training; forming the third training data set including the second source speaker voice data group and the personalized voice data group that are parallel corpus with respect to each other; and inputting the second source speaker voice data group and the personalized voice data group into the network for training.
-
-
-
-
-
-