Patent search ap:("Google LLC") AND inv:"Bhuvana Ramabhadran" Page 4

31.

发明申请
Speech Recognition Using Unspoken Text and Speech Synthesis 有权

公开(公告)号：US20210350786A1

公开(公告)日：2021-11-11

申请号：US16869552

申请日：2020-05-07

Applicant: Google LLC

Inventor： Zhehuai Chen , Andrew M. Rosenberg , Bhuvana Ramabhadran , Pedro J. Moreno Mengibar

IPC: G10L13/04 , G10L15/06 , G10L13/08

Abstract: A method for training a generative adversarial network (GAN)-based text-to-speech (TTS) model and a speech recognition model in unison includes obtaining a plurality of training text utterances. At each of a plurality of output steps for each training text utterance, the method also includes generating, for output by the GAN-Based TTS model, a synthetic speech representation of the corresponding training text utterance, and determining, using an adversarial discriminator of the GAN, an adversarial loss term indicative of an amount of acoustic noise disparity in one of the non-synthetic speech representations selected from the set of spoken training utterances relative to the corresponding synthetic speech representation of the corresponding training text utterance. The method also includes updating parameters of the GAN-based TTS model based on the adversarial loss term determined at each of the plurality of output steps for each training text utterance of the plurality of training text utterances.

32.

发明申请
Language-agnostic Multilingual Modeling Using Effective Script Normalization 有权

公开(公告)号：US20210233510A1

公开(公告)日：2021-07-29

申请号：US17152760

申请日：2021-01-19

Applicant: Google LLC

Inventor： Arindrima Datta , Bhuvana Ramabhadran , Jesse Emond , Brian Roak

IPC: G10L15/00 , G06N3/04 , G10L15/16 , G10L15/26 , G06F40/58 , G10L15/06

Abstract: A method includes obtaining a plurality of training data sets each associated with a respective native language and includes a plurality of respective training data samples. For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio in the respective native language to generate a respective normalized training data sample. The method also includes training, using the normalized training data samples, a multilingual end-to-end speech recognition model to predict speech recognition results in the target script for corresponding speech utterances spoken in any of the different native languages associated with the plurality of training data sets.

33.

发明申请
USING NON-PARALLEL VOICE CONVERSION FOR SPEECH CONVERSION MODELS 有权

公开(公告)号：US20250095639A1

公开(公告)日：2025-03-20

申请号：US18962686

申请日：2024-11-27

Applicant: Google LLC

Inventor： Andrew M. Rosenberg , Gary Wang , Bhuvana Ramabhadran , Fadi Biadsy

IPC: G10L15/06 , G10L13/02 , G10L15/16 , G10L15/197 , G10L15/22 , G10L19/00 , G10L19/038 , G10L21/003

Abstract: A method includes receiving a set of training utterances each including a non-synthetic speech representation of a corresponding utterance, and for each training utterance, generating a corresponding synthetic speech representation by using a voice conversion model. The non-synthetic speech representation and the synthetic speech representation form a corresponding training utterance pair. At each of a plurality of output steps for each training utterance pair, the method also includes generating, for output by a speech recognition model, a first probability distribution over possible non-synthetic speech recognition hypotheses for the non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses for the synthetic speech representation. The method also includes determining a consistent loss term for the corresponding training utterance pair based on the first and second probability distributions and updating parameters of the speech recognition model based on the consistent loss term.

34.

发明申请
Zero-Shot Task Expansion of ASR Models Using Task Vectors 有权

公开(公告)号：US20250078813A1

公开(公告)日：2025-03-06

申请号：US18817181

申请日：2024-08-27

Applicant: Google LLC

Inventor： Kartik Audhkhasi , Gowtham Ramesh , Bhuvana Ramabhadran

IPC: G10L15/06

Abstract: A method includes training, using an un-supervised learning technique, an auxiliary ASR model based on a first set of un-transcribed source task speech utterances to determine a first task vector, training, using the un-supervised learning technique, the auxiliary ASR model based on a second set of un-transcribed speech utterances to determine a second task vector, and training, using the un-supervised learning technique, the auxiliary ASR model based on un-transcribed target task speech utterances to determine a target task vector. The method also includes determining a first correlation between the first and target task vectors, determining a second correlation between the second and target task vectors, and adapting parameters of a trained primary ASR model based on the first and second source task vectors and the first and second correlations to teach the primary ASR model to learn how to recognize speech associated with the target task.

35.

发明申请
Multilingual Re-Scoring Models for Automatic Speech Recognition 有权

公开(公告)号：US20240420692A1

公开(公告)日：2024-12-19

申请号：US18818010

申请日：2024-08-28

Applicant: Google LLC

Inventor： Neeraj Gaur , Tongzhou Chen , Ehsan Variani , Bhuvana Ramabhadran , Parisa Haghani , Pedro J. Moreno Mengibar

IPC: G10L15/197 , G10L15/00 , G10L15/16 , G10L15/22

Abstract: A method includes receiving a sequence of acoustic frames extracted from audio data corresponding to an utterance. During a first pass, the method includes processing the sequence of acoustic frames to generate N candidate hypotheses for the utterance. During a second pass, and for each candidate hypothesis, the method includes: generating a respective un-normalized likelihood score; generating a respective external language model score; generating a standalone score that models prior statistics of the corresponding candidate hypothesis; and generating a respective overall score for the candidate hypothesis based on the un-normalized likelihood score, the external language model score, and the standalone score. The method also includes selecting the candidate hypothesis having the highest respective overall score from among the N candidate hypotheses as a final transcription of the utterance.

36.

发明公开
USING TEXT-INJECTION TO RECOGNIZE SPEECH WITHOUT TRANSCRIPTION 审中-公开

公开(公告)号：US20240304178A1

公开(公告)日：2024-09-12

申请号：US18439630

申请日：2024-02-12

Applicant: Google LLC

Inventor： Andrew M Rosenberg , Yacob Yochai Blau , Bhuvana Ramabhadran , Genady Beryozkin , Gary Wang , Zhehuai Chen , Rohan Agrawal , Parisa Haghani

IPC: G10L15/06 , G10L15/22 , G10L15/26

CPC classification number: G10L15/063 , G10L15/22 , G10L15/26

Abstract: A method includes receiving training data including transcribed speech utterances spoken in a general domain, modified speech utterances in a target domain, and unspoken textual utterances corresponding to the transcriptions of the modified speech utterances in the target domain. The modified speech utterances include utterances spoken in the target domain that have been modified to obfuscate one or more classes of sensitive information recited in the utterances. The method also includes generating a corresponding alignment output for each unspoken textual utterance of the received training data using an alignment model. The method also includes training a speech recognition model on the alignment outputs generated for the corresponding to the unspoken textual utterances, the un-transcribed speech utterances, and the transcribed speech utterances to teach the speech recognition model to learn to recognize speech in the target domain and phrases within the one or more classes of sensitive information.

37.

发明授权
Multilingual speech synthesis and cross-language voice cloning 有权

公开(公告)号：US12087273B2

公开(公告)日：2024-09-10

申请号：US18161217

申请日：2023-01-30

Applicant: Google LLC

Inventor： Yu Zhang , Ron J. Weiss , Byungha Chun , Yonghui Wu , Zhifeng Chen , Russell John Wyatt Skerry-Ryan , Ye Jia , Andrew M. Rosenberg , Bhuvana Ramabhadran

IPC: G10L21/00 , G10L13/00 , G10L13/047

CPC classification number: G10L13/047

Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.

38.

发明授权
Training speech synthesis to generate distinct speech sounds 有权

公开(公告)号：US12087272B2

公开(公告)日：2024-09-10

申请号：US17756995

申请日：2019-12-13

Applicant: Google LLC

Inventor： Andrew Rosenberg , Bhuvana Ramabhadran , Fadi Biadsy , Yu Zhang

IPC: G10L15/16 , G10L13/047 , G10L13/08 , G10L15/06

CPC classification number: G10L13/047 , G10L13/086 , G10L15/063 , G10L15/16

Abstract: A method (800) of training a text-to-speech (TTS) model (108) includes obtaining training data (150) including reference input text (104) that includes a sequence of characters, a sequence of reference audio features (402) representative of the sequence of characters, and a sequence of reference phone labels (502) representative of distinct speech sounds of the reference audio features. For each of a plurality of time steps, the method includes generating a corresponding predicted audio feature (120) based on a respective portion of the reference input text for the time step and generating, using a phone label mapping network (510), a corresponding predicted phone label (520) associated with the predicted audio feature. The method also includes aligning the predicted phone label with the reference phone label to determine a corresponding predicted phone label loss (622) and updating the TTS model based on the corresponding predicted phone label loss.

39.

发明公开
Self-Training With Oracle And Top-Ranked Hypotheses 审中-公开

公开(公告)号：US20240296832A1

公开(公告)日：2024-09-05

申请号：US18590918

申请日：2024-02-28

Applicant: Google LLC

Inventor： Andrew M. Rosenberg , Murali Karthick Baskar , Bhuvana Ramabhadran

IPC: G10L15/06 , G10L15/01 , G10L15/16 , G10L15/197

CPC classification number: G10L15/063 , G10L15/01 , G10L15/16 , G10L15/197

Abstract: A method includes, for each training sample of a plurality of training samples, processing, using an RNN-T model, a corresponding sequence of acoustic frames to obtain an n-best list of speech recognition hypotheses, and, for each speech recognition hypothesis of the n-best list, determining a corresponding number of word errors relative to a corresponding ground-truth transcription. For a top-ranked hypothesis from the n-best list, the method includes determining a first loss based on the corresponding ground-truth transcription. The method includes identifying, as an oracle hypothesis, the speech recognition hypothesis from the n-best list having the smallest corresponding number of word errors relative to the corresponding ground-truth transcription, and determining a second loss for the oracle hypothesis based on the corresponding ground-truth transcription. The method includes determining a corresponding self-training combined loss based on the first and second losses, and training the model based on the corresponding self-training combined loss.

40.

发明授权
Using speech recognition to improve cross-language speech synthesis 有权

公开(公告)号：US11990117B2

公开(公告)日：2024-05-21

申请号：US17451613

申请日：2021-10-20

Applicant: Google LLC

Inventor： Zhehuai Chen , Bhuvana Ramabhadran , Andrew Rosenberg , Yu Zhang , Pedro J. Moreno Mengibar

IPC: G10L13/047 , G10L13/08 , G10L13/10

CPC classification number: G10L13/047 , G10L13/086 , G10L13/10

Abstract: A method for training a speech recognition model includes obtaining a multilingual text-to-speech (TTS) model. The method also includes generating a native synthesized speech representation for an input text sequence in a first language that is conditioned on speaker characteristics of a native speaker of the first language. The method also includes generating a cross-lingual synthesized speech representation for the input text sequence in the first language that is conditioned on speaker characteristics of a native speaker of a different second language. The method also includes generating a first speech recognition result for the native synthesized speech representation and a second speech recognition result for the cross-lingual synthesized speech representation. The method also includes determining a consistent loss term based on the first speech recognition result and the second speech recognition result and updating parameters of the speech recognition model based on the consistent loss term.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification