Patent search ap:("Google LLC") AND inv:"Bhuvana Ramabhadran" Page 7

61.

发明申请
Mixture Model Attention for Flexible Streaming and Non-Streaming Automatic Speech Recognition 有权

公开(公告)号：US20220310074A1

公开(公告)日：2022-09-29

申请号：US17644344

申请日：2021-12-15

Applicant: Google LLC

Inventor： Kartik Audhkhasi , Bhuvana Ramabhadran , Tongzhou Chen , Pedro J. Moreno Mengibar

IPC: G10L15/16 , G10L19/16 , G06N3/04 , G06F1/03

Abstract: A method for an automated speech recognition (ASR) model for unifying streaming and non-streaming speech recognition including receiving a sequence of acoustic frames. The method includes generating, using an audio encoder of an automatic speech recognition (ASR) model, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method further includes generating, using a joint encoder of the ASR model, a probability distribution over possible speech recognition hypothesis at the corresponding time step based on the higher order feature representation generated by the audio encoder at the corresponding time step. The audio encoder comprises a neural network that applies mixture model (MiMo) attention to compute an attention probability distribution function (PDF) using a set of mixture components of softmaxes over a context window.

62.

发明申请
Regularizing Word Segmentation 有权

公开(公告)号：US20220310061A1

公开(公告)日：2022-09-29

申请号：US17656225

申请日：2022-03-23

Applicant: Google LLC

Inventor： Bhuvana Ramabhadran , Hainan Xu , Kartik Audhkhasi , Yinghui Huang

IPC: G10L15/04 , G10L25/30 , G06N3/04

Abstract: A method for subword segmentation includes receiving an input word to be segmented into a plurality of subword units. The method also includes executing a subword segmentation routine to segment the input word into a plurality of subword units by accessing a trained vocabulary set of subword units and selecting the plurality of subword units from the input word by greedily finding a longest subword unit from the input word that is present in the trained vocabulary set until an end of the input word is reached.

63.

发明申请
Self-Adaptive Distillation 有权

公开(公告)号：US20220309340A1

公开(公告)日：2022-09-29

申请号：US17544570

申请日：2021-12-07

Applicant: Google LLC

Inventor： Isabel Leal , Neeraj Gaur , Parisa Haghani , Brian Farris , Bhuvana Ramabhadran , Manasa Prasad , Pedro J. Moreno Mengibar , Yun Zhu

IPC: G06N3/08 , G06N3/04 , G10L15/06

Abstract: A method for distilling one or more trained teacher automatic speech recognition (ASR) models into a multilingual student model includes receiving a plurality of teacher training examples and a plurality of student training examples. The method also includes training one or more teacher automatic speech recognition (ASR) models using the plurality of teacher training examples. Each teacher ASR model is configured to output a respective textual representation of a respective audio input. The method further includes generating a multi-lingual student ASR model by training the multi-lingual student ASR model using the plurality of student training examples and distilling the trained one or more teacher ASR models into the multilingual student ASR model using a tunable distillation loss weight. Each student ASR model is configured to receive an audio input and output a corresponding textual representation of the received audio input.

64.

发明授权
Transliteration for speech recognition training and scoring 有权

公开(公告)号：US11417322B2

公开(公告)日：2022-08-16

申请号：US16712492

申请日：2019-12-12

Applicant: Google LLC

Inventor： Bhuvana Ramabhadran , Min Ma , Pedro J. Moreno Mengibar , Jesse Emond , Brian E. Roark

IPC: G10L15/19 , G10L15/06 , G10L15/16 , G10L15/22 , G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs stored on a computer-readable storage medium, for transliteration for speech recognition training and scoring. In some implementations, language examples are accessed, some of which include words in a first script and words in one or more other scripts. At least portions of some of the language examples are transliterated to the first script to generate a training data set. A language model is generated based on occurrences of the different sequences of words in the training data set in the first script. The language model is used to perform speech recognition for an utterance.

65.

发明申请
MULTILINGUAL SPEECH SYNTHESIS AND CROSS-LANGUAGE VOICE CLONING 审中-公开

公开(公告)号：US20200380952A1

公开(公告)日：2020-12-03

申请号：US16855042

申请日：2020-04-22

Applicant: Google LLC

Inventor： Yu Zhang , Ron J. Weiss , Byungha Chun , Yonghui Wu , Zhifeng Chen , Russell John Wyatt Skerry-Ryan , Ye Jia , Andrew M. Rosenberg , Bhuvana Ramabhadran

IPC: G10L13/047

Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.

Patent Agency Ranking