Patent search ap:("Beijing Baidu Netcom Science Technology Co. Page Ltd.") AND inv:"Yongguo KANG"

1.

发明申请
METHOD OF CONVERTING SPEECH, ELECTRONIC DEVICE, AND READABLE STORAGE MEDIUM 有权

公开(公告)号：US20220383876A1

公开(公告)日：2022-12-01

申请号：US17818609

申请日：2022-08-09

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Yixiang CHEN , Junchao WANG , Yongguo KANG

IPC: G10L15/26 , G10L15/16 , G10L15/02

Abstract: A method of converting a speech, an electronic device, and a readable storage medium are provided, which relate to a field of artificial intelligence technology such as speech and deep learning, in particular to speech converting technology. The method of converting a speech includes: acquiring a first speech of a target speaker; acquiring a speech of an original speaker; extracting a first feature parameter of the first speech of the target speaker; extracting a second feature parameter of the speech of the original speaker; processing the first feature parameter and the second feature parameter to obtain a Mel spectrum information; and converting the Mel spectrum information to output a second speech of the target speaker having a tone identical to a tone of the first speech of the target speaker and a content identical to a content of the speech of the original speaker.

2.

发明申请
METHOD OF TRAINING DEEP LEARNING MODEL, AND METHOD OF SYNTHESIZING SPEECH 有权

公开(公告)号：US20250157457A1

公开(公告)日：2025-05-15

申请号：US19023572

申请日：2025-01-16

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Bin HUANG , Tao SUN , Ce ZHANG , Yongguo KANG , Xiaoyin FU , Lei JIA

IPC: G10L13/027

Abstract: A method of training a deep learning model and a method of synthesizing a speech are provided, which relate to a field of artificial intelligence technology, in particular to fields of large model, large language model, generative model, deep learning, and speech processing technologies. The method of training a deep learning model includes: determining a reference speech feature of a sample speech, the reference speech feature being associated with a prosodic feature of the sample speech; retrieving a speech library using a sample text corresponding to the sample speech, so as to obtain a pronunciation expression feature of the sample text; inputting the pronunciation expression feature into the deep learning model to obtain an output speech feature; determining a loss of the deep learning model according to the reference speech feature and the output speech feature; and adjusting a parameter of the deep learning model according to the loss.

3.

发明申请
METHOD AND APPARATUS FOR SPEECH GENERATION 有权

公开(公告)号：US20220301545A1

公开(公告)日：2022-09-22

申请号：US17830130

申请日：2022-06-01

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Yongguo KANG , Junchao WANG

IPC: G10L13/08 , G10L13/04 , G10L17/14 , G10L17/02 , G10L13/033

Abstract: A method for speech generation includes: acquiring speech information of an original speaker; performing text feature extraction on the speech information to obtain a text feature corresponding to the speech information; converting the text feature to an acoustic feature corresponding to a target speaker; and generating a target speech signal based on the acoustic feature.

Patent Agency Ranking