Patent search ap:("Google LLC") AND inv:"Bhuvana Ramabhadran" Page 1

1.

发明授权
Advancing the use of text and speech in ASR pretraining with consistency and contrastive losses 有权

公开(公告)号：US12272363B2

公开(公告)日：2025-04-08

申请号：US17722264

申请日：2022-04-15

Applicant: Google LLC

Inventor： Andrew Rosenberg , Zhehuai Chen , Bhuvana Ramabhadran , Pedro J. Moreno Mengibar , Yuan Wang , Yu Zhang

IPC: G10L19/00 , G10L13/02 , G10L15/06 , G10L15/26

Abstract: A method includes receiving training data that includes unspoken text utterances, un-transcribed non-synthetic speech utterances, and transcribed non-synthetic speech utterances. Each unspoken text utterance is not paired with any corresponding spoken utterance of non-synthetic speech. Each un-transcribed non-synthetic speech utterance is not paired with a corresponding transcription. Each transcribed non-synthetic speech utterance is paired with a corresponding transcription. The method also includes generating a corresponding synthetic speech representation for each unspoken textual utterance of the received training data using a text-to-speech model. The method also includes pre-training an audio encoder on the synthetic speech representations generated for the unspoken textual utterances, the un-transcribed non-synthetic speech utterances, and the transcribed non-synthetic speech utterances to teach the audio encoder to jointly learn shared speech and text representations.

2.

发明授权
Multilingual re-scoring models for automatic speech recognition 有权

公开(公告)号：US12254875B2

公开(公告)日：2025-03-18

申请号：US18589220

申请日：2024-02-27

Applicant: Google LLC

Inventor： Neeraj Gaur , Tongzhou Chen , Ehsan Variani , Bhuvana Ramabhadran , Parisa Haghani , Pedro J. Moreno Mengibar

IPC: G10L15/197 , G10L15/00 , G10L15/16 , G10L15/22

Abstract: A method includes receiving a sequence of acoustic frames extracted from audio data corresponding to an utterance. During a first pass, the method includes processing the sequence of acoustic frames to generate N candidate hypotheses for the utterance. During a second pass, and for each candidate hypothesis, the method includes: generating a respective un-normalized likelihood score; generating a respective external language model score; generating a standalone score that models prior statistics of the corresponding candidate hypothesis; and generating a respective overall score for the candidate hypothesis based on the un-normalized likelihood score, the external language model score, and the standalone score. The method also includes selecting the candidate hypothesis having the highest respective overall score from among the N candidate hypotheses as a final transcription of the utterance.

3.

发明申请
MULTILINGUAL SPEECH SYNTHESIS AND CROSS-LANGUAGE VOICE CLONING 有权

公开(公告)号：US20240404506A1

公开(公告)日：2024-12-05

申请号：US18797760

申请日：2024-08-08

Applicant: Google LLC

Inventor： Yu Zhang , Ron J. Weiss , Byungha Chun , Yonghui Wu , Zhifeng Chen , Russell John Wyatt Skerry-Ryan , Ye Jia , Andrew M. Rosenberg , Bhuvana Ramabhadran

IPC: G10L13/047

Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.

4.

发明授权
Regularizing word segmentation 有权

公开(公告)号：US12087279B2

公开(公告)日：2024-09-10

申请号：US17656225

申请日：2022-03-23

Applicant: Google LLC

Inventor： Bhuvana Ramabhadran , Hainan Xu , Kartik Audhkhasi , Yinghui Huang

IPC: G10L15/02 , G06F40/284 , G06N3/04 , G10L15/04 , G10L15/06 , G10L15/16 , G10L25/30

CPC classification number: G10L15/04 , G06F40/284 , G06N3/04 , G10L15/063 , G10L15/16 , G10L25/30 , G10L15/02

Abstract: A method for subword segmentation includes receiving an input word to be segmented into a plurality of subword units. The method also includes executing a subword segmentation routine to segment the input word into a plurality of subword units by accessing a trained vocabulary set of subword units and selecting the plurality of subword units from the input word by greedily finding a longest subword unit from the input word that is present in the trained vocabulary set until an end of the input word is reached.

5.

发明授权
Speech recognition using unspoken text and speech synthesis 有权

公开(公告)号：US11837216B2

公开(公告)日：2023-12-05

申请号：US18168969

申请日：2023-02-14

Applicant: Google LLC

Inventor： Zhehuai Chen , Andrew M. Rosenberg , Bhuvana Ramabhadran , Pedro J. Moreno Mengibar

IPC: G10L13/00 , G10L13/08 , G10L15/06

CPC classification number: G10L13/00 , G10L13/08 , G10L15/063

Abstract: A method for training a generative adversarial network (GAN)-based text-to-speech (TTS) model and a speech recognition model in unison includes obtaining a plurality of training text utterances. At each of a plurality of output steps for each training text utterance, the method also includes generating, for output by the GAN-Based TTS model, a synthetic speech representation of the corresponding training text utterance, and determining, using an adversarial discriminator of the GAN, an adversarial loss term indicative of an amount of acoustic noise disparity in one of the non-synthetic speech representations selected from the set of spoken training utterances relative to the corresponding synthetic speech representation of the corresponding training text utterance. The method also includes updating parameters of the GAN-based TTS model based on the adversarial loss term determined at each of the plurality of output steps for each training text utterance of the plurality of training text utterances.

6.

发明公开
Rare Word Recognition with LM-aware MWER Training 审中-公开

公开(公告)号：US20230298570A1

公开(公告)日：2023-09-21

申请号：US18187222

申请日：2023-03-21

Applicant: Google LLC

Inventor： Weiran Wang , Tongzhou Chen , Tara N. Sainath , Ehsan Variani , Rohit Prakash Prabhavalkar , Ronny Huang , Bhuvana Ramabhadran , Neeraj Gaur , Sepand Mavandadi , Charles Caleb Peyser , Trevor Strohman , Yangzhang He , David Rybach

IPC: G10L15/06 , G10L15/19 , G10L15/22 , G10L15/16 , G10L15/02

CPC classification number: G10L15/063 , G10L15/19 , G10L15/22 , G10L15/16 , G10L15/02

Abstract: A method includes generating, using an audio encoder, a higher-order feature representation for each acoustic frame in a sequence of acoustic frames; generating, using a decoder, based on the higher-order feature representation, a plurality of speech recognition hypotheses, each hypotheses corresponding to a candidate transcription of an utterance and having an associated first likelihood score; generating, using an external language model, for each speech recognition hypothesis, a second likelihood score; determining, using a learnable fusion module, for each speech recognition hypothesis, a set of fusion weights based on the higher-order feature representation and the speech recognition hypothesis; and generating, using the learnable fusion module, for each speech recognition hypothesis, a third likelihood score based on the first likelihood score, the second likelihood score, and the set of fusion weights, the audio encoder and decoder trained using minimum additive error rate training in the presence of the external language model.

7.

发明公开
Language-agnostic Multilingual Modeling Using Effective Script Normalization 审中-公开

公开(公告)号：US20230223009A1

公开(公告)日：2023-07-13

申请号：US18187330

申请日：2023-03-21

Applicant: Google LLC

Inventor： Arindrima Datta , Bhuvana Ramabhadran , Jesse Emond , Brian Roark

IPC: G10L15/00 , G10L15/16 , G10L15/26 , G06F40/58 , G10L15/06

CPC classification number: G10L15/005 , G10L15/16 , G10L15/26 , G06F40/58 , G10L15/063 , G06N3/049

Abstract: A method includes obtaining a plurality of training data sets each associated with a respective native language and includes a plurality of respective training data samples. For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio in the respective native language to generate a respective normalized training data sample. The method also includes training, using the normalized training data samples, a multilingual end-to-end speech recognition model to predict speech recognition results in the target script for corresponding speech utterances spoken in any of the different native languages associated with the plurality of training data sets.

8.

发明公开
MULTILINGUAL SPEECH SYNTHESIS AND CROSS-LANGUAGE VOICE CLONING 审中-公开

公开(公告)号：US20230178068A1

公开(公告)日：2023-06-08

申请号：US18161217

申请日：2023-01-30

Applicant: Google LLC

Inventor： Yu Zhang , Ron J. Weiss , Byungha Chun , Yonghui Wu , Zhifeng Chen , Russell John Wyatt Skerry-Ryan , Ye Jia , Andrew M. Rosenberg , Bhuvana Ramabhadran

IPC: G10L13/047

CPC classification number: G10L13/047

Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.

9.

发明授权
Speech recognition using unspoken text and speech synthesis 有权

公开(公告)号：US11605368B2

公开(公告)日：2023-03-14

申请号：US17454536

申请日：2021-11-11

Applicant: Google LLC

Inventor： Zhehuai Chen , Andrew M. Rosenberg , Bhuvana Ramabhadran , Pedro J. Moreno Mengibar

IPC: G10L15/16 , G10L13/00 , G10L13/08 , G10L15/06

Abstract: A method for training a generative adversarial network (GAN)-based text-to-speech (TTS) model and a speech recognition model in unison includes obtaining a plurality of training text utterances. At each of a plurality of output steps for each training text utterance, the method also includes generating, for output by the GAN-Based TTS model, a synthetic speech representation of the corresponding training text utterance, and determining, using an adversarial discriminator of the GAN, an adversarial loss term indicative of an amount of acoustic noise disparity in one of the non-synthetic speech representations selected from the set of spoken training utterances relative to the corresponding synthetic speech representation of the corresponding training text utterance. The method also includes updating parameters of the GAN-based TTS model based on the adversarial loss term determined at each of the plurality of output steps for each training text utterance of the plurality of training text utterances.

10.

发明申请
Improving Speech Recognition with Speech Synthesis-based Model Adapation 有权

公开(公告)号：US20230058447A1

公开(公告)日：2023-02-23

申请号：US17445537

申请日：2021-08-20

Applicant: Google LLC

Inventor： Andrew Rosenberg , Bhuvana Ramabhadran

IPC: G10L21/007 , G10L15/26 , G10L25/30 , G06N3/08

Abstract: A method for training a speech recognition model includes obtaining sample utterances of synthesized speech in a target domain, obtaining transcribed utterances of non-synthetic speech in the target domain, and pre-training the speech recognition model on the sample utterances of synthesized speech in the target domain to attain an initial state for warm-start training. After pre-training the speech recognition model, the method also includes warm-start training the speech recognition model on the transcribed utterances of non-synthetic speech in the target domain to teach the speech recognition model to learn to recognize real/human speech in the target domain.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification