专利检索 ap:("BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.") AND inv:"Yongguo Kang" 第 1 页

1.

发明授权
Method and apparatus for generating speech synthesis model 有权

公开(公告)号：US10971131B2

公开(公告)日：2021-04-06

申请号：US16053897

申请日：2018-08-03

申请人： Baidu Online Network Technology (Beijing) Co., Ltd.

发明人： Yongguo Kang

IPC分类号： G10L13/027 , G10L13/033 , G06N3/08 , G10L13/00 , G06K9/62 , G10L25/30

摘要： The present disclosure discloses a method and apparatus for generating a speech synthesis model. A specific embodiment of the method comprises: acquiring a plurality of types of training samples, each of the plurality of types of training samples including a text of the type, and a speech of the text having a style of speech corresponding to the type read by an announcer corresponding to the type; and training a neural network corresponding to a speech synthesis model using the plurality of types of training samples and an annotation of the style of speech in the each of the plurality of types of training samples to obtain the speech synthesis model, the speech synthesis model being used to synthesize speech of the announcer corresponding to each of the plurality of types having a plurality of styles.

2.

发明授权
Speech synthesis method terminal and storage medium 有权

公开(公告)号：US10789938B2

公开(公告)日：2020-09-29

申请号：US16099257

申请日：2016-09-05

申请人： BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

发明人： Hao Li , Yongguo Kang

IPC分类号： G10L13/047 , G10L13/06 , G06F17/28 , G10L13/08 , G10L13/033 , G06F40/30 , G10L13/04

摘要： A speech synthesis method and device. The method comprises: determining language types of a statement to be synthesized; determining base models corresponding to the language types; determining a target timbre, performing adaptive transformation on the spectrum parameter models based on the target timbre, and training the statement to be synthesized based on the spectrum parameter models subjected to adaptive transformation to generate spectrum parameters; training the statement to be synthesized based on the fundamental frequency parameters to generate fundamental frequency parameters, and adjusting the fundamental frequency parameters based on the target timbre; and synthesizing the statement to be synthesized into a target speech based on the spectrum parameters, and the fundamental frequency parameters after adjusting.

3.

发明授权
Speech broadcasting method, device, apparatus and computer-readable storage medium 有权

公开(公告)号：US11011175B2

公开(公告)日：2021-05-18

申请号：US16563491

申请日：2019-09-06

申请人： Baidu Online Network Technology (Beijing) Co., Ltd.

发明人： Yongguo Kang

IPC分类号： G01L17/00 , G10L17/00 , G10L17/02 , G10L17/06 , G10L25/90

摘要： Embodiments of a speech broadcasting method, device, apparatus and a computer-readable storage medium are provided. The method can include: receiving recorded speech data from a plurality of speakers; extracting respective text features of the plurality of speakers from the recorded speech data, and allocating the plurality of speakers with respective identifications; and inputting the text features and the identifications of the speakers to a text-acoustic mapping model, to output speech features of the plurality of speakers; and establishing a mapping relationship between the text feature and the speech feature of each speaker. In the embodiments of the present application, a broadcaster can be selected to broadcast a text, greatly improving user experience of the text broadcasting.

4.

发明授权
Method and apparatus for generating text-to-speech model 有权

公开(公告)号：US11017762B2

公开(公告)日：2021-05-25

申请号：US16236076

申请日：2018-12-28

申请人： Baidu Online Network Technology (Beijing) Co., Ltd.

发明人： Yongguo Kang , Yu Gu

IPC分类号： G10L13/08 , G06N3/08 , G06Q20/00 , G10L13/047 , G10L25/30

摘要： Embodiments of the present disclosure disclose a method and apparatus for generating a text-to-speech model. A specific implementation of the method includes: obtaining a training sample set, a training sample including sample text information, sample audio data corresponding to the sample text information, and a fundamental frequency of the sample audio data; obtaining an initial deep neural network; and using the sample text information of the training sample in the training sample set as an input, and using the sample audio data corresponding to the input sample text information and the fundamental frequency of the sample audio data as an output, to train the initial deep neural network using a machine learning method, and defining the trained initial deep neural network as the text-to-speech model.

5.

发明申请
METHOD AND APPARATUS FOR GENERATING TEXT-TO-SPEECH MODEL 审中-公开

公开(公告)号：US20190355344A1

公开(公告)日：2019-11-21

申请号：US16236076

申请日：2018-12-28

申请人： Baidu Online Network Technology (Beijing) Co., Ltd.

发明人： Yongguo Kang , Yu Gu

IPC分类号： G10L13/08 , G10L13/047 , G10L25/30 , G06N3/08 , G06Q20/00

摘要： Embodiments of the present disclosure disclose a method and apparatus for generating a text-to-speech model. A specific implementation of the method includes: obtaining a training sample set, a training sample including sample text information, sample audio data corresponding to the sample text information, and a fundamental frequency of the sample audio data; obtaining an initial deep neural network; and using the sample text information of the training sample in the training sample set as an input, and using the sample audio data corresponding to the input sample text information and the fundamental frequency of the sample audio data as an output, to train the initial deep neural network using a machine learning method, and defining the trained initial deep neural network as the text-to-speech model.

6.

发明授权
Computer-implemented method and apparatus for generating grapheme-to-phoneme model 有权

公开(公告)号：US10181320B2

公开(公告)日：2019-01-15

申请号：US15391907

申请日：2016-12-28

申请人： BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

发明人： Zhijie Chen , Yongguo Kang

IPC分类号： G06N3/04 , G06N3/08 , G10L15/02 , G10L15/06 , G10L15/16 , G06F17/22 , G10L13/00

摘要： A method and an apparatus for generating a g2p model based on AI are provided. The method includes: during performing a grapheme-to-phoneme conversion training by a neural network on each word in training data, screening nodes in a hidden layer of the neural network randomly according to a preset node ratio so as to obtain retaining nodes for training each word; training each word with a sub-neural network corresponding to the retaining nodes and updating a weight of each retaining node of the sub-neural network; and performing a mean processing on the weights of the retaining nodes of respective sub-neural networks, so as to generate the grapheme-to-phoneme model.