Patent search ap:("Google LLC") AND inv:"Bhuvana Ramabhadran" Page 3

21.

发明授权
Generating diverse and natural text-to-speech samples 有权

公开(公告)号：US11475874B2

公开(公告)日：2022-10-18

申请号：US17163007

申请日：2021-01-29

Applicant: Google LLC

Inventor： Yu Zhang , Bhuvana Ramabhadran , Andrew Rosenberg , Yonghui Wu , Byungha Chun , Ron Weiss , Yuan Cao

IPC: G10L25/30 , G10L25/00 , G10L17/00 , G10L13/047 , G10L25/18 , G06N3/08 , G10L15/06 , G10L13/10

Abstract: A method of generating diverse and natural text-to-speech (TTS) samples includes receiving a text and generating a speech sample based on the text using a TTS model. A training process trains the TTS model to generate the speech sample by receiving training samples. Each training sample includes a spectrogram and a training text corresponding to the spectrogram. For each training sample, the training process identifies speech units associated with the training text. For each speech unit, the training process generates a speech embedding, aligns the speech embedding with a portion of the spectrogram, extracts a latent feature from the aligned portion of the spectrogram, and assigns a quantized embedding to the latent feature. The training process generates the speech sample by decoding a concatenation of the speech embeddings and a quantized embeddings for the speech units associated with the training text corresponding to the spectrogram.

22.

发明申请
Multilingual Re-Scoring Models for Automatic Speech Recognition 有权

公开(公告)号：US20220310081A1

公开(公告)日：2022-09-29

申请号：US17701635

申请日：2022-03-22

Applicant: Google LLC

Inventor： Neeraj Gaur , Tongzhou Chen , Ehsan Variani , Bhuvana Ramabhadran , Parisa Haghani , Pedro J. Moreno Mengibar

IPC: G10L15/197 , G10L15/16 , G10L15/22 , G10L15/00

Abstract: A method includes receiving a sequence of acoustic frames extracted from audio data corresponding to an utterance. During a first pass, the method includes processing the sequence of acoustic frames to generate N candidate hypotheses for the utterance. During a second pass, and for each candidate hypothesis, the method includes generating a respective un-normalized likelihood score; generating a respective external language model score; generating a standalone score that models prior statistics of the corresponding candidate hypothesis, and generating a respective overall score for the candidate hypothesis based on the un-normalized likelihood score, the external language model score, and the standalone score. The method also includes selecting the candidate hypothesis having the highest respective overall score from among the N candidate hypotheses as a final transcription of the utterance.

23.

发明授权
Supervised and unsupervised training with contrastive loss over sequences 有权

公开(公告)号：US12230249B2

公开(公告)日：2025-02-18

申请号：US17655903

申请日：2022-03-22

Applicant: Google LLC

Inventor： Andrew Rosenberg , Bhuvana Ramabhadran , Zhehuai Chen , Yuan Wang , Yu Zhang , Jesse Emond

IPC: G10L15/06 , G10L13/02 , G10L15/16 , G10L15/22

Abstract: A method includes receiving audio data corresponding to an utterance and generating a pair of positive audio data examples. Here, each positive audio data example includes a respective augmented copy of the received audio data. For each respective positive audio data example, the method includes generating a respective sequence of encoder outputs and projecting the respective sequence of encoder outputs for the positive data example into a contrastive loss space. The method also includes determining a L2 distance between each corresponding encoder output in the projected sequences of encoder outputs for the positive audio data examples and determining a per-utterance consistency loss by averaging the L2 distances. The method also includes generating corresponding speech recognition results for each respective positive audio data example. The method also includes updating parameters of the speech recognition model based on a respective supervised loss term and the per-utterance consistency loss.

24.

发明授权
Mixture model attention for flexible streaming and non-streaming automatic speech recognition 有权

公开(公告)号：US12136415B2

公开(公告)日：2024-11-05

申请号：US17644343

申请日：2021-12-15

Applicant: Google LLC

Inventor： Kartik Audhkhasi , Bhuvana Ramabhadran , Tongzhou Chen , Pedro J. Moreno Mengibar

IPC: G10L15/16 , G06F1/03 , G06N3/04 , G06N3/0455 , G10L19/16

Abstract: A method for an automated speech recognition (ASR) model for unifying streaming and non-streaming speech recognition including receiving a sequence of acoustic frames. The method includes generating, using an audio encoder of an automatic speech recognition (ASR) model, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method further includes generating, using a joint encoder of the ASR model, a probability distribution over possible speech recognition hypothesis at the corresponding time step based on the higher order feature representation generated by the audio encoder at the corresponding time step. The audio encoder comprises a neural network that applies mixture model (MiMo) attention to compute an attention probability distribution function (PDF) using a set of mixture components of softmaxes over a context window.

25.

发明授权
Multilingual re-scoring models for automatic speech recognition 有权

公开(公告)号：US12080283B2

公开(公告)日：2024-09-03

申请号：US17701635

申请日：2022-03-22

Applicant: Google LLC

Inventor： Neeraj Gaur , Tongzhou Chen , Ehsan Variani , Bhuvana Ramabhadran , Parisa Haghani , Pedro J. Moreno Mengibar

IPC: G10L15/197 , G10L15/00 , G10L15/16 , G10L15/22

CPC classification number: G10L15/197 , G10L15/005 , G10L15/16 , G10L15/22

Abstract: A method includes receiving a sequence of acoustic frames extracted from audio data corresponding to an utterance. During a first pass, the method includes processing the sequence of acoustic frames to generate N candidate hypotheses for the utterance. During a second pass, and for each candidate hypothesis, the method includes: generating a respective un-normalized likelihood score; generating a respective external language model score; generating a standalone score that models prior statistics of the corresponding candidate hypothesis; and generating a respective overall score for the candidate hypothesis based on the un-normalized likelihood score, the external language model score, and the standalone score. The method also includes selecting the candidate hypothesis having the highest respective overall score from among the N candidate hypotheses as a final transcription of the utterance.

26.

发明公开
Multilingual Re-Scoring Models for Automatic Speech Recognition 审中-公开

公开(公告)号：US20240203409A1

公开(公告)日：2024-06-20

申请号：US18589220

申请日：2024-02-27

Applicant: Google LLC

Inventor： Neeraj Gaur , Tongzhou Chen , Ehsan Variani , Bhuvana Ramabhadran , Parisa Haghani , Pedro J. Moreno Mengibar

IPC: G10L15/197 , G10L15/00 , G10L15/16 , G10L15/22

CPC classification number: G10L15/197 , G10L15/005 , G10L15/16 , G10L15/22

Abstract: A method includes receiving a sequence of acoustic frames extracted from audio data corresponding to an utterance. During a first pass, the method includes processing the sequence of acoustic frames to generate N candidate hypotheses for the utterance. During a second pass, and for each candidate hypothesis, the method includes: generating a respective un-normalized likelihood score; generating a respective external language model score; generating a standalone score that models prior statistics of the corresponding candidate hypothesis; and generating a respective overall score for the candidate hypothesis based on the un-normalized likelihood score, the external language model score, and the standalone score. The method also includes selecting the candidate hypothesis having the highest respective overall score from among the N candidate hypotheses as a final transcription of the utterance.

27.

发明授权
Mixture model attention for flexible streaming and non-streaming automatic speech recognition 有权

公开(公告)号：US12014729B2

公开(公告)日：2024-06-18

申请号：US17644344

申请日：2021-12-15

Applicant: Google LLC

Inventor： Kartik Audhkhasi , Bhuvana Ramabhadran , Tongzhou Chen , Pedro J. Moreno Mengibar

IPC: G10L15/16 , G06F1/03 , G06N3/04 , G06N3/0455 , G10L19/16

CPC classification number: G10L15/16 , G06F1/03 , G06N3/04 , G06N3/0455 , G10L19/167

Abstract: A method for an automated speech recognition (ASR) model for unifying streaming and non-streaming speech recognition including receiving a sequence of acoustic frames. The method includes generating, using an audio encoder of an automatic speech recognition (ASR) model, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method further includes generating, using a joint encoder of the ASR model, a probability distribution over possible speech recognition hypothesis at the corresponding time step based on the higher order feature representation generated by the audio encoder at the corresponding time step. The audio encoder comprises a neural network that applies mixture model (MiMo) attention to compute an attention probability distribution function (PDF) using a set of mixture components of softmaxes over a context window.

28.

发明授权
Consistency prediction on streaming sequence models 有权

公开(公告)号：US11929060B2

公开(公告)日：2024-03-12

申请号：US17170836

申请日：2021-02-08

Applicant: Google LLC

Inventor： Zhehuai Chen , Andrew Rosenberg , Bhuvana Ramabhadran , Pedro Jose Moreno Mengibar

IPC: G10L15/06 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/08 , G06N3/088 , G10L13/02 , G10L15/16 , G10L15/197

CPC classification number: G10L15/063 , G06N3/044 , G06N3/045 , G06N3/088 , G10L13/02 , G10L15/16 , G10L15/197 , G10L2015/0635

Abstract: A method for training a speech recognition model includes receiving a set of training utterance pairs each including a non-synthetic speech representation and a synthetic speech representation of a same corresponding utterance. At each of a plurality of output steps for each training utterance pair in the set of training utterance pairs, the method also includes determining a consistent loss term for the corresponding training utterance pair based on a first probability distribution over possible non-synthetic speech recognition hypotheses generated for the corresponding non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses generated for the corresponding synthetic speech representation. The first and second probability distributions are generated for output by the speech recognition model. The method also includes updating parameters of the speech recognition model based on the consistent loss term determined at each of the plurality of output steps for each training utterance pair.

29.

发明申请
Speech Recognition Using Unspoken Text and Speech Synthesis 有权

公开(公告)号：US20220068255A1

公开(公告)日：2022-03-03

申请号：US17454536

申请日：2021-11-11

Applicant: Google LLC

Inventor： Zhehuai Chen , Andrew M. Rosenberg , Bhuvana Ramabhadran , Pedro J. Moreno Mengibar

IPC: G10L13/00 , G10L13/08 , G10L15/06

Abstract: A method for training a generative adversarial network (GAN)-based text-to-speech (TTS) model and a speech recognition model in unison includes obtaining a plurality of training text utterances. At each of a plurality of output steps for each training text utterance, the method also includes generating, for output by the GAN-Based TTS model, a synthetic speech representation of the corresponding training text utterance, and determining, using an adversarial discriminator of the GAN, an adversarial loss term indicative of an amount of acoustic noise disparity in one of the non-synthetic speech representations selected from the set of spoken training utterances relative to the corresponding synthetic speech representation of the corresponding training text utterance. The method also includes updating parameters of the GAN-based TTS model based on the adversarial loss term determined at each of the plurality of output steps for each training text utterance of the plurality of training text utterances.

30.

发明授权
Speech recognition using unspoken text and speech synthesis 有权

公开(公告)号：US11222620B2

公开(公告)日：2022-01-11

申请号：US16869552

申请日：2020-05-07

Applicant: Google LLC

Inventor： Zhehuai Chen , Andrew M. Rosenberg , Bhuvana Ramabhadran , Pedro J. Moreno Mengibar

IPC: G10L15/26 , G10L13/00 , G10L13/08 , G10L15/06

Abstract: A method for training a generative adversarial network (GAN)-based text-to-speech (TTS) model and a speech recognition model in unison includes obtaining a plurality of training text utterances. At each of a plurality of output steps for each training text utterance, the method also includes generating, for output by the GAN-Based TTS model, a synthetic speech representation of the corresponding training text utterance, and determining, using an adversarial discriminator of the GAN, an adversarial loss term indicative of an amount of acoustic noise disparity in one of the non-synthetic speech representations selected from the set of spoken training utterances relative to the corresponding synthetic speech representation of the corresponding training text utterance. The method also includes updating parameters of the GAN-based TTS model based on the adversarial loss term determined at each of the plurality of output steps for each training text utterance of the plurality of training text utterances.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification