Patent search ap:("Google LLC") AND inv:"Bhuvana Ramabhadran" Page 5

41.

发明授权
Improving speech recognition with speech synthesis-based model adapation 有权

公开(公告)号：US11823697B2

公开(公告)日：2023-11-21

申请号：US17445537

申请日：2021-08-20

Applicant: Google LLC

Inventor： Andrew Rosenberg , Bhuvana Ramabhadran

IPC: G10L15/26 , G10L21/007 , G06N3/08 , G10L25/30

CPC classification number: G10L21/007 , G06N3/08 , G10L15/26 , G10L25/30

Abstract: A method for training a speech recognition model includes obtaining sample utterances of synthesized speech in a target domain, obtaining transcribed utterances of non-synthetic speech in the target domain, and pre-training the speech recognition model on the sample utterances of synthesized speech in the target domain to attain an initial state for warm-start training. After pre-training the speech recognition model, the method also includes warm-start training the speech recognition model on the transcribed utterances of non-synthetic speech in the target domain to teach the speech recognition model to learn to recognize real/human speech in the target domain.

42.

发明公开
Alignment Prediction to Inject Text into Automatic Speech Recognition Training 审中-公开

公开(公告)号：US20230317059A1

公开(公告)日：2023-10-05

申请号：US18168470

申请日：2023-02-13

Applicant: Google LLC

Inventor： Andrew M Rosenberg , Zhehuai Chen , Yu Zhang , Bhuvana Ramabhadran , Pedro J. Moreno Mengibar

IPC: G10L15/197 , G06F40/289 , G10L15/16 , G10L15/06

CPC classification number: G10L15/063 , G06F40/289 , G10L15/16 , G10L15/197 , G10L2015/0635

Abstract: A method includes receiving training data that includes unspoken textual utterances, un-transcribed non-synthetic speech utterances, and transcribed non-synthetic speech utterances. Each unspoken textual utterance is not paired with any corresponding spoken utterance of non-synthetic speech. Each un-transcribed non-synthetic speech utterance not paired with a corresponding transcription. Each transcribed non-synthetic speech utterance paired with a corresponding transcription. The method also includes generating a corresponding alignment output for each unspoken textual utterance of the received training data using an alignment model. The method also includes pre-training an audio encoder on the alignment outputs generated for corresponding to the unspoken textual utterances, the un-transcribed non-synthetic speech utterances, and the transcribed non-synthetic speech utterances to teach the audio encoder to jointly learn shared speech and text representations.

43.

发明公开
Speech Recognition Using Unspoken Text and Speech Synthesis 审中-公开

公开(公告)号：US20230197057A1

公开(公告)日：2023-06-22

申请号：US18168969

申请日：2023-02-14

Applicant: Google LLC

Inventor： Zhehuai Chen , Andrew M. Rosenberg , Bhuvana Ramabhadran , Pedro J. Moreno Mengibar

IPC: G10L13/00 , G10L13/08 , G10L15/06

CPC classification number: G10L13/00 , G10L13/08 , G10L15/063

Abstract: A method for training a generative adversarial network (GAN)-based text-to-speech (TTS) model and a speech recognition model in unison includes obtaining a plurality of training text utterances. At each of a plurality of output steps for each training text utterance, the method also includes generating, for output by the GAN-Based TTS model, a synthetic speech representation of the corresponding training text utterance, and determining, using an adversarial discriminator of the GAN, an adversarial loss term indicative of an amount of acoustic noise disparity in one of the non-synthetic speech representations selected from the set of spoken training utterances relative to the corresponding synthetic speech representation of the corresponding training text utterance. The method also includes updating parameters of the GAN-based TTS model based on the adversarial loss term determined at each of the plurality of output steps for each training text utterance of the plurality of training text utterances.

44.

发明授权
Language-agnostic multilingual modeling using effective script normalization 有权

公开(公告)号：US11615779B2

公开(公告)日：2023-03-28

申请号：US17152760

申请日：2021-01-19

Applicant: Google LLC

Inventor： Arindrima Datta , Bhuvana Ramabhadran , Jesse Emond , Brian Roark

IPC: G10L15/00 , G06F40/58 , G06N3/04 , G10L15/06 , G10L15/16 , G10L15/26 , G06N3/049

Abstract: A method includes obtaining a plurality of training data sets each associated with a respective native language and includes a plurality of respective training data samples. For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio in the respective native language to generate a respective normalized training data sample. The method also includes training, using the normalized training data samples, a multilingual end-to-end speech recognition model to predict speech recognition results in the target script for corresponding speech utterances spoken in any of the different native languages associated with the plurality of training data sets.

45.

发明授权
Multilingual speech synthesis and cross-language voice cloning 有权

公开(公告)号：US11580952B2

公开(公告)日：2023-02-14

申请号：US16855042

申请日：2020-04-22

Applicant: Google LLC

Inventor： Yu Zhang , Ron J. Weiss , Byungha Chun , Yonghui Wu , Zhifeng Chen , Russell John Wyatt Skerry-Ryan , Ye Jia , Andrew M. Rosenberg , Bhuvana Ramabhadran

IPC: G10L13/00 , G10L13/047

Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.

46.

发明申请
Supervised and Unsupervised Training with Contrastive Loss Over Sequences 有权

公开(公告)号：US20220310065A1

公开(公告)日：2022-09-29

申请号：US17655903

申请日：2022-03-22

Applicant: Google LLC

Inventor： Andrew Rosenberg , Bhuvana Ramabhadran , Zhehuai Chen , Gary Wang , Yu Zhang , Jesse Emond

IPC: G10L15/06 , G10L15/16 , G10L15/22 , G10L13/02 , G06N3/08

Abstract: A method includes receiving audio data corresponding to an utterance and generating a pair of positive audio data examples. Here, each positive audio data example includes a respective augmented copy of the received audio data. For each respective positive audio data example, the method includes generating a respective sequence of encoder outputs and projecting the respective sequence of encoder outputs for the positive data example into a contrastive loss space. The method also includes determining a L2 distance between each corresponding encoder output in the projected sequences of encoder outputs for the positive audio data examples and determining a per-utterance consistency loss by averaging the L2 distances. The method also includes generating corresponding speech recognition results for each respective positive audio data example. The method also includes updating parameters of the speech recognition model based on a respective supervised loss term and the per-utterance consistency loss.

47.

发明申请
Conformer-based Speech Conversion Model 有权

公开(公告)号：US20220310056A1

公开(公告)日：2022-09-29

申请号：US17655030

申请日：2022-03-16

Applicant: Google LLC

Inventor： Bhuvana Ramabhadran , Zhehuai Chen , Fadi Biadsy , Pedro J. Moreno Mengibar

IPC: G10L13/027 , G10L25/18 , G10L15/22 , G10L15/16 , G10L13/047

Abstract: A method for speech conversion includes receiving, as input to an encoder of a speech conversion model, an input spectrogram corresponding to an utterance, the encoder including a stack of self-attention blocks. The method further includes generating, as output from the encoder, an encoded spectrogram and receiving, as input to a spectrogram decoder of the speech conversion model, the encoded spectrogram generated as output from the encoder. The method further includes generating, as output from the spectrogram decoder, an output spectrogram corresponding to a synthesized speech representation of the utterance.

48.

发明申请
Instantaneous Learning in Text-To-Speech During Dialog 有权

公开(公告)号：US20220284882A1

公开(公告)日：2022-09-08

申请号：US17190456

申请日：2021-03-03

Applicant: Google LLC

Inventor： Vijayaditya Peddinti , Bhuvana Ramabhadran , Andrew Rosenberg , Mateusz Golebiewski

IPC: G10L13/08 , G10L15/187

Abstract: A method for instantaneous learning in text-to-speech (TTS) during dialog includes receiving a user pronunciation of a particular word present in a query spoken by a user. The method also includes receiving a TTS pronunciation of the same particular word that is present in a TTS input where the TTS pronunciation of the particular word is different than the user pronunciation of the particular word. The method also includes obtaining user pronunciation-related features and TTS pronunciation related features associated with the particular word. The method also includes generating a pronunciation decision selecting one of the user pronunciation or the TTS pronunciation of the particular word that is associated with a highest confidence. The method also include providing the TTS audio that includes a synthesized speech representation of the response to the query using the user pronunciation or the TTS pronunciation for the particular word.

49.

发明申请
Using Speech Recognition to Improve Cross-Language Speech Synthesis 有权

公开(公告)号：US20220122581A1

公开(公告)日：2022-04-21

申请号：US17451613

申请日：2021-10-20

Applicant: Google LLC

Inventor： Zhehuai Chen , Bhuvana Ramabhadran , Andrew Rosenberg , Yu Zhang , Pedro J. Moreno Mengibar

IPC: G10L13/047 , G10L13/08 , G10L13/10

Abstract: A method for training a speech recognition model includes obtaining a multilingual text-to-speech (TTS) model. The method also includes generating a native synthesized speech representation for an input text sequence in a first language that is conditioned on speaker characteristics of a native speaker of the first language. The method also includes generating a cross-lingual synthesized speech representation for the input text sequence in the first language that is conditioned on speaker characteristics of a native speaker of a different second language. The method also includes generating a first speech recognition result for the native synthesized speech representation and a second speech recognition result for the cross-lingual synthesized speech representation. The method also includes determining a consistent loss term based on the first speech recognition result and the second speech recognition result and updating parameters of the speech recognition model based on the consistent loss term.

50.

发明申请
Consistency Prediction On Streaming Sequence Models 有权

公开(公告)号：US20210280170A1

公开(公告)日：2021-09-09

申请号：US17170836

申请日：2021-02-08

Applicant: Google LLC

Inventor： Zhehuai Chen , Andrew Rosenberg , Bhuvana Ramabhadran , Pedro Jose Moreno Mengibar

IPC: G10L15/06 , G10L15/197 , G10L13/02 , G10L15/16 , G06N3/04 , G06N3/08

Abstract: A method for training a speech recognition model includes receiving a set of training utterance pairs each including a non-synthetic speech representation and a synthetic speech representation of a same corresponding utterance. At each of a plurality of output steps for each training utterance pair in the set of training utterance pairs, the method also includes determining a consistent loss term for the corresponding training utterance pair based on a first probability distribution over possible non-synthetic speech recognition hypotheses generated for the corresponding non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses generated for the corresponding synthetic speech representation. The first and second probability distributions are generated for output by the speech recognition model. The method also includes updating parameters of the speech recognition model based on the consistent loss term determined at each of the plurality of output steps for each training utterance pair.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification