Patent search ap:("Beijing Baidu Netcom Science Technology Co. Page Ltd.") AND inv:"Lei JIA"

1.

发明申请
SPEECH SYNTHESIS METHOD AND APPARATUS, DEVICE AND COMPUTER STORAGE MEDIUM 有权

公开(公告)号：US20230059882A1

公开(公告)日：2023-02-23

申请号：US17738186

申请日：2022-05-06

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Liqiang ZHANG , Jiankang HOU , Tao SUN , Lei JIA

IPC: G10L13/10 , G06F40/20 , G10L13/047

Abstract: The present disclosure discloses a speech synthesis method and apparatus, a device and a computer storage medium, and relates to speech and deep learning technologies in the field of artificial intelligence technologies. A specific implementation solution involves: acquiring to-be-synthesized text; acquiring a prosody feature extracted from the text; inputting the text and the prosody feature into a speech synthesis model to obtain a vocoder feature; and inputting the vocoder feature into a vocoder to obtain synthesized speech.

2.

发明申请
METHOD AND APPARATUS OF SYNTHESIZING SPEECH, METHOD AND APPARATUS OF TRAINING SPEECH SYNTHESIS MODEL, ELECTRONIC DEVICE, AND STORAGE MEDIUM 有权

公开(公告)号：US20220020356A1

公开(公告)日：2022-01-20

申请号：US17489616

申请日：2021-09-29

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Wenfu WANG , Tao SUN , Xilei WANG , Junteng ZHANG , Zhengkun GAO , Lei JIA

IPC: G10L13/10 , G10L25/30

Abstract: The present disclosure provides a method and apparatus of synthesizing a speech, a method and apparatus of training a speech synthesis model, an electronic device, and a storage medium. The method of synthesizing a speech includes acquiring a style information of a speech to be synthesized, a tone information of the speech to be synthesized, and a content information of a text to be processed; generating an acoustic feature information of the text to be processed, by using a pre-trained speech synthesis model, based on the style information, the tone information, and the content information of the text to be processed; and synthesizing the speech for the text to be processed, based on the acoustic feature information of the text to be processed.

3.

发明公开
VOICE RECOGNITION MODEL TRAINING METHOD, VOICE RECOGNITION METHOD, ELECTRONIC DEVICE, AND STORAGE MEDIUM 审中-公开

公开(公告)号：US20240221727A1

公开(公告)日：2024-07-04

申请号：US18266432

申请日：2022-09-01

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Lanhua YOU , Lei JIA , Qi ZHANG , Zhengxiang JIANG

IPC: G10L15/06 , G10L15/01 , G10L15/02 , G10L15/16

CPC classification number: G10L15/063 , G10L15/01 , G10L15/02 , G10L15/16

Abstract: The present disclosure provides a voice recognition model training method and apparatus, an electronic device and a storage medium, relating to the field of artificial intelligence technology, and in particular to the fields such as deep learning and voice recognition. The specific implementation scheme includes constructing a negative sample according to a positive sample to obtain a target negative sample for constraining a voice decoding path; obtaining training data according to the positive sample and the target negative sample; and training a first voice recognition model according to the training data to obtain a second voice recognition model.

4.

发明申请
METHOD OF REGISTERING ATTRIBUTE IN SPEECH SYNTHESIS MODEL, APPARATUS OF REGISTERING ATTRIBUTE IN SPEECH SYNTHESIS MODEL, ELECTRONIC DEVICE, AND MEDIUM 有权

公开(公告)号：US20220076657A1

公开(公告)日：2022-03-10

申请号：US17455156

申请日：2021-11-16

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Wenfu WANG , Xilei WANG , Tao SUN , Han YUAN , Zhengkun GAO , Lei JIA

IPC: G10L13/02

Abstract: A method of registering an attribute in a speech synthesis model, an apparatus of registering an attribute in a speech synthesis model, an electronic device, and a medium are provided, which relate to a field of an artificial intelligence technology such as a deep learning and intelligent speech technology. The method includes: acquiring a plurality of data associated with an attribute to be registered; and registering the attribute in the speech synthesis model by using the plurality of data associated with the attribute, wherein the speech synthesis model is trained in advance by using a training data in a training data set.

5.

发明申请
METHOD OF PROCESSING AUDIO DATA, ELECTRONIC DEVICE AND STORAGE MEDIUM 有权

公开(公告)号：US20230087531A1

公开(公告)日：2023-03-23

申请号：US18071187

申请日：2022-11-29

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Jiankang HOU , Zhipeng NIE , Liqiang ZHANG , Tao SUN , Lei JIA

IPC: G10L25/18 , G10L25/30

Abstract: A method of processing audio data, an electronic device, and a storage medium, which relates to a field of artificial intelligence, in particular to a field of speech processing technology. The method includes: processing spectral data of the audio data to obtain a first feature information; obtaining a fundamental frequency indication information according to the first feature information, wherein the fundamental frequency indication information indicates valid audio data of the first feature information and invalid audio data of the first feature information; obtaining a fundamental frequency information and a spectral energy information according to the first feature information and the fundamental frequency indication information; and obtaining a harmonic structure information of the audio data according to the fundamental frequency information and the spectral energy information.

6.

发明申请
SPEECH PROCESSING METHOD AND APPARATUS, DEVICE AND COMPUTER STORAGE MEDIUM 有权

公开(公告)号：US20230056128A1

公开(公告)日：2023-02-23

申请号：US17736175

申请日：2022-05-04

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Liqiang ZHANG , Jiankang HOU , Tao SUN , Lei JIA

IPC: G10L13/10 , G10L25/21 , G10L25/18

Abstract: The present disclosure discloses a speech processing method and apparatus, a device and a computer storage medium, and relates to speech and deep learning technologies in the field of artificial intelligence technologies. A specific implementation solution involves: acquiring a vocoder feature obtained for text; correcting a value of an unvoiced and voiced (UV) feature in the vocoder feature according to an energy feature and/or a speech spectrum feature in the vocoder feature; and providing the corrected vocoder feature for a vocoder, so as to obtain synthesized speech.

7.

发明公开
AUDIO SIGNAL PROCESSING METHOD, TRAINING METHOD, APPARATUS AND STORAGE MEDIUM 审中-公开

公开(公告)号：US20230197096A1

公开(公告)日：2023-06-22

申请号：US17812784

申请日：2022-07-15

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Wenkai ZHANG , Ce ZHANG , Zheng LI , Lei JIA

IPC: G10L21/0224 , G10L15/22 , G10L25/30 , G10L15/06

CPC classification number: G10L21/0224 , G10L15/22 , G10L15/063 , G10L25/30 , G10L2015/223 , G10L2021/02082

Abstract: Provided are an audio signal processing method, a training method, an apparatus and a storage medium, relating to the field of data processing, in particular to, the field of voice. The audio signal processing method includes: eliminating at least part of a linear echo signal from a mixed voice signal, to obtain an intermediate processing signal, where the mixed voice signal is obtained by mixing a target voice signal with an echo signal, and the echo signal is generated in an environment where the target voice signal is located and includes the linear echo signal and a nonlinear echo signal; and removing the nonlinear echo signal and a residual part of the linear echo signal from the intermediate processing signal, by using a target full convolution neural network model, to obtain an approximate target voice signal, the target full convolution neural network model including at least two convolution layers.

8.

发明申请
SPEECH RECOGNITION AND CODEC METHOD AND APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM 有权

公开(公告)号：US20230090590A1

公开(公告)日：2023-03-23

申请号：US17738651

申请日：2022-05-06

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Xiaoyin FU , Zhijie CHEN , Mingxin LIANG , Mingshun YANG , Lei JIA , Haifeng WANG

IPC: G10L15/02 , G10L15/26 , G10L15/187 , G06F16/683

Abstract: The present disclosure provides speech recognition and codec methods and apparatuses, an electronic device and a storage medium, and relates to the field of artificial intelligence such as intelligent speech, deep learning and natural language processing. The speech recognition method may include: acquiring an audio feature of to-be-recognized speech; encoding the audio feature to obtain an encoding feature; truncating the encoding feature to obtain continuous N feature fragments, N being a positive integer greater than one; and acquiring, for any one of the feature segments, corresponding historical feature abstraction information, encoding the feature segment in combination with the historical feature abstraction information, and decoding an encoding result to obtain a recognition result corresponding to the feature segment, wherein the historical feature abstraction information is information obtained by feature abstraction of recognized historical feature fragments.

9.

发明申请
SPEECH SYNTHESIS METHOD, AND ELECTRONIC DEVICE 有权

公开(公告)号：US20230005466A1

公开(公告)日：2023-01-05

申请号：US17820339

申请日：2022-08-17

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Zhengkun GAO , Junteng ZHANG , Tao SUN , Lei JIA

IPC: G10L13/08 , G10L13/047

Abstract: The disclosure provides a speech synthesis method, and an electronic device. The technical solution is described as follows. A text to be synthesized and speech features of a target user are obtained. Predicted first acoustic features based on the text to be synthesized and the speech features are obtained. A target template audio is obtained from a template audio library based on the text to be synthesized. Second acoustic features of the target template audio are extracted. Target acoustic features are generated by splicing the first acoustic features and the second acoustic features. Speech synthesis is performed on the text to be synthesized based on the target acoustic features and the speech features, to generate a target speech of the text to be synthesized.

10.

发明申请
SPEECH RECOGNITION 有权

公开(公告)号：US20250078839A1

公开(公告)日：2025-03-06

申请号：US18819018

申请日：2024-08-29

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Xiaoyin FU , Qiguang ZANG , Fenfen SHENG , Haifeng WANG , Lei JIA

IPC: G10L15/32 , G10L15/02 , G10L15/04 , G10L15/06 , G10L15/183

Abstract: A speech recognition method and a method for training a deep learning model are provided. The speech recognition method includes: obtaining a first speech feature of a speech to-be-recognized, which includes a plurality of speech segment features corresponding to a plurality of speech segments; decoding the first speech feature using a first decoder to obtain a plurality of first decoding results corresponding to a plurality of the words, indicating a first recognition result of words; extracting a second speech feature from the first speech feature based on first a priori information, which includes the plurality of first decoding results, and the second speech feature includes first word-level audio features corresponding to the plurality of words; and decoding the second speech feature using a second decoder to obtain a plurality of second decoding results corresponding to the plurality of words, indicating a second recognition result of the word.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification