Patent search ap:("UBTECH ROBOTICS CORP LTD") AND inv:"Dongyan Huang" Page 1

1.

发明申请
SOUND SOURCE LOCALIZATION METHOD, ELECTRONIC DEVICE AND COMPUTER-READABLE STORAGE MEDIUM 有权

公开(公告)号：US20250133337A1

公开(公告)日：2025-04-24

申请号：US18911197

申请日：2024-10-09

Applicant: UBTECH ROBOTICS CORP LTD

Inventor： ZEHONG ZHENG , Dongyan Huang , Xianjie Yang , Wan Ding

IPC: H04R1/40

Abstract: A sound source localization method includes: obtaining a first audio frame and at least two second audio frames, wherein the first audio frame and the at least two second audio frames are synchronously sampled, the first audio frame is obtained by processing sound signals collected by the first microphone, the at least two second audio frames are obtained by processing sound signals collected by the second microphones; calculating a time delay estimation between the first audio frame and each of the at least two second audio frames; and determining a sound source orientation corresponding to the first audio frame and the at least two second audio frames through a preset time delay-orientation lookup table according to the time delay estimation between the first audio frame and each of the at least two second audio frames.

2.

发明授权
Voice conversion training method and server and computer readable storage medium 有权

公开(公告)号：US11282503B2

公开(公告)日：2022-03-22

申请号：US17095751

申请日：2020-11-12

Applicant: UBTECH ROBOTICS CORP LTD

Inventor： Ruotong Wang , Dongyan Huang , Xian Li , Jiebin Xie , Zhichao Tang , Wan Ding , Yang Liu , Bai Li , Youjun Xiong

IPC: G10L15/06 , G06N3/08 , G10L15/16 , G10L15/30 , G10L21/01 , G10L25/18 , G10L25/24 , G10L21/003

Abstract: The present disclosure discloses a voice conversion training method. The method includes: forming a first training data set including a plurality of training voice data groups; selecting two of the training voice data groups from the first training data set to input into a voice conversion neural network for training; forming a second training data set including the first training data set and a first source speaker voice data group; inputting one of the training voice data groups selected from the first training data set and the first source speaker voice data group into the network for training; forming the third training data set including the second source speaker voice data group and the personalized voice data group that are parallel corpus with respect to each other; and inputting the second source speaker voice data group and the personalized voice data group into the network for training.

3.

发明申请
STREAMING VOICE CONVERSION METHOD AND APPARATUS AND COMPUTER READABLE STORAGE MEDIUM USING THE SAME 有权

公开(公告)号：US20210201925A1

公开(公告)日：2021-07-01

申请号：US17110323

申请日：2020-12-03

Applicant: UBTECH ROBOTICS CORP LTD

Inventor： Jiebin Xie , Ruotong Wang , Dongyan Huang , Zhichao Tang , Yang Liu , Youjun Xiong

IPC: G10L21/013 , G10L15/02 , G10L25/03 , G10L25/69 , G10L15/04 , G10L13/033

Abstract: The present disclosure provides a streaming voice conversion method as well as an apparatus and a computer readable storage medium using the same. The method includes: obtaining to-be-converted voice data; partitioning the to-be-converted voice data in an order of data obtaining time as a plurality of to-be-converted partition voices, where the to-be-converted partition voice data carries a partition mark; performing a voice conversion on each of the to-be-converted partition voices to obtain a converted partition voice, where the converted partition voice carries a partition mark; performing a partition restoration on each of the converted partition voices to obtain a restored partition voice, where the restored partition voice carries a partition mark; and outputting each of the restored partition voices according to the partition mark carried by the restored partition voice. In this manner, the response time is shortened, and the conversion speed is improved.

4.

发明申请
METHOD AND APPARATUS FOR VOICE CONVERSION AND STORAGE MEDIUM 有权

公开(公告)号：US20210193160A1

公开(公告)日：2021-06-24

申请号：US17084672

申请日：2020-10-30

Applicant: UBTECH ROBOTICS CORP LTD.

Inventor： RUOTONG WANG , Zhichao Tang , Dongyan Huang , Jiebin Xie , Zhiyuan Zhao , Yang Liu , Youjun Xiong

IPC: G10L21/013 , G10L25/03 , G10L25/27 , G10L19/02 , G06N20/00

Abstract: The present disclosure discloses a voice conversion method. The method includes: obtaining a to-be-converted voice, and extracting acoustic features of the to-be-converted voice; obtaining a source vector corresponding to the to-be-converted voice from a source vector pool, and selecting a target vector corresponding to the target voice from the target vector pool; obtaining acoustic features of the target voice output by the voice conversion model by using the acoustic features of the to-be-converted voice, the source vector corresponding to the to-be-converted voice, and the target vector corresponding to the target voice as an input of the voice conversion model; and obtaining the target voice by converting the acoustic features of the target voice using a vocoder. In addition, a voice conversion apparatus and a storage medium are also provided.

5.

发明授权
Method and apparatus for voice conversion and storage medium 有权

公开(公告)号：US11996112B2

公开(公告)日：2024-05-28

申请号：US17084672

申请日：2020-10-30

Applicant: UBTECH ROBOTICS CORP LTD

Inventor： Ruotong Wang , Zhichao Tang , Dongyan Huang , Jiebin Xie , Zhiyuan Zhao , Yang Liu , Youjun Xiong

IPC: G10L25/21 , G06N20/00 , G10L19/02 , G10L21/013 , G10L25/03 , G10L25/24 , G10L25/27 , G10L25/75

CPC classification number: G10L21/013 , G06N20/00 , G10L19/02 , G10L25/03 , G10L25/27 , G10L2021/0135

Abstract: The present disclosure discloses a voice conversion method. The method includes: obtaining a to-be-converted voice, and extracting acoustic features of the to-be-converted voice; obtaining a source vector corresponding to the to-be-converted voice from a source vector pool, and selecting a target vector corresponding to the target voice from the target vector pool; obtaining acoustic features of the target voice output by the voice conversion model by using the acoustic features of the to-be-converted voice, the source vector corresponding to the to-be-converted voice, and the target vector corresponding to the target voice as an input of the voice conversion model; and obtaining the target voice by converting the acoustic features of the target voice using a vocoder. In addition, a voice conversion apparatus and a storage medium are also provided.

6.

发明授权
Speech synthesis method and apparatus and computer readable storage medium using the same 有权

公开(公告)号：US11417316B2

公开(公告)日：2022-08-16

申请号：US17115729

申请日：2020-12-08

Applicant: UBTECH ROBOTICS CORP LTD

Inventor： Dongyan Huang , Leyuan Sheng , Youjun Xiong

IPC: G10L13/08 , G10L13/047 , G10L25/24

Abstract: The present disclosure provides a speech synthesis method as well as an apparatus and a computer readable storage medium using the same. The method includes: obtaining a to-be-synthesized text, and extracting to-be-processed Mel spectrum features of the to-be-synthesized text through a preset speech feature extraction algorithm; inputting the to-be-processed Mel spectrum features into a preset ResUnet network model to obtain first intermediate features; performing an average pooling and a first down sampling on the to-be-processed Mel spectrum features to obtain second intermediate features; taking the second intermediate features and the first intermediate features output by the ResUnet network model as an input to perform a deconvolution and a first up sampling so as to obtain target Mel spectrum features corresponding to the to-be-processed Mel spectrum features; and converting the target Mel spectrum features into a target speech corresponding to the to-be-synthesized text.

7.

发明授权
Streaming voice conversion method and apparatus and computer readable storage medium using the same 有权

公开(公告)号：US11367456B2

公开(公告)日：2022-06-21

申请号：US17110323

申请日：2020-12-03

Applicant: UBTECH ROBOTICS CORP LTD

Inventor： Jiebin Xie , Ruotong Wang , Dongyan Huang , Zhichao Tang , Yang Liu , Youjun Xiong

IPC: G10L21/013 , G10L13/033 , G10L15/02 , G10L15/04 , G10L25/03 , G10L25/69

Abstract: The present disclosure provides a streaming voice conversion method as well as an apparatus and a computer readable storage medium using the same. The method includes: obtaining to-be-converted voice data; partitioning the to-be-converted voice data in an order of data obtaining time as a plurality of to-be-converted partition voices, where the to-be-converted partition voice data carries a partition mark; performing a voice conversion on each of the to-be-converted partition voices to obtain a converted partition voice, where the converted partition voice carries a partition mark; performing a partition restoration on each of the converted partition voices to obtain a restored partition voice, where the restored partition voice carries a partition mark; and outputting each of the restored partition voices according to the partition mark carried by the restored partition voice. In this manner, the response time is shortened, and the conversion speed is improved.

8.

发明申请
CONTEXT-BASED MULTI-TURN DIALOGUE METHOD AND STORAGE MEDIUM 有权

公开(公告)号：US20210200961A1

公开(公告)日：2021-07-01

申请号：US17102395

申请日：2020-11-23

Applicant: UBTECH ROBOTICS CORP LTD

Inventor： Chi Shao , Dongyan Huang , Wan Ding , Youjun Xiong

IPC: G06F40/35 , G06F40/284 , G06N3/04

Abstract: The present disclosure discloses a context-based multi-turn dialogue method. The method includes: obtaining to-be-matched historical dialogue information; performing a word feature extraction based on the to-be-matched historical dialogue information to obtain a historical dialogue word embedding; obtaining candidate answer information; performing the word feature extraction based on the candidate answer information to obtain a candidate answer word embedding; obtaining a historical dialogue partial matching vector and a candidate answer partial matching vector by performing partial semantic relationship matching based on the historical dialogue word embedding and the candidate answer word embedding; obtaining a candidate answer matching probability by performing a matching probability calculation based on the historical dialogue partial matching vector and the candidate answer partial matching vector; and determining matched answer information based on the candidate answer information and the candidate answer matching probability.

9.

发明公开
METHOD FOR GENERATING TALKING HEAD VIDEO, DEVICE AND COMPUTER-READABLE STORAGE MEDIUM 审中-公开

公开(公告)号：US20230386116A1

公开(公告)日：2023-11-30

申请号：US18202291

申请日：2023-05-26

Applicant: UBTECH ROBOTICS CORP LTD

Inventor： WAN DING , Dongyan Huang , Linhuang Yan , Zhiyong Yang

IPC: G06T13/40 , G06T13/20 , G06V40/20 , G10L13/02

CPC classification number: G06T13/40 , G06T13/205 , G06V40/20 , G10L13/02

Abstract: A method for generating a talking head video includes: obtaining a text and an image containing a face of a user; determining a phoneme sequence that corresponds to the text and includes one or more phonemes; determining acoustic features corresponding to the text according to the phoneme sequence, and obtaining synthesized speech corresponding to the text according to the acoustic features; determining a first mouth movement sequence corresponding to the text according to the phoneme sequence, and determining a second mouth movement sequence corresponding to the text according to the acoustic features; creating a facial action video corresponding to the user according to the first mouth movement sequence, the second mouth movement sequence and the image; and processing the synthesized speech and the facial action video synchronously to obtain a talking head video corresponding to the user.

10.

发明公开
SPEECH SYNTHESIS METHOD, DEVICE AND COMPUTER-READABLE STORAGE MEDIUM 审中-公开

公开(公告)号：US20230206895A1

公开(公告)日：2023-06-29

申请号：US18089576

申请日：2022-12-28

Applicant: UBTECH ROBOTICS CORP LTD

Inventor： Wan Ding , Dongyan Huang , Zhiyuan Zhao , Zhiyong Yang

IPC: G10L13/047 , G10L13/10

CPC classification number: G10L13/047 , G10L13/10

Abstract: A speech synthesis method includes: obtaining an acoustic feature sequence of a text to be processed; processing the acoustic feature sequence by using a non-autoregressive computing model in parallel to obtain first audio information of the text, to be processed, wherein the first audio information comprises audio corresponding to each segment; processing the acoustic feature sequence and the first audio information by using an autoregressive computing model to obtain a residual value corresponding to each segment; and obtaining second audio information corresponding to an i-th segment based on the first audio information corresponding to the i-th segment and the residual values corresponding to a first to an (i-1)-th segment, wherein a synthesized audio of the text to be processed comprises each of the second audio information, i=1, 2 . . . n, n is a total number of the segments.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification