Patent search ap:("GOOGLE LLC") AND inv:"Tara N. Sainath" Page 3

21.

发明授权
Joint unsupervised and supervised training for multilingual ASR 有权

公开(公告)号：US12249317B2

公开(公告)日：2025-03-11

申请号：US17929934

申请日：2022-09-06

Applicant: Google LLC

Inventor： Bo Li , Junwen Bai , Yu Zhang , Ankur Bapna , Nikhil Siddhartha , Khe Chai Sim , Tara N. Sainath

IPC: G10L15/16 , G10L15/02 , G10L15/06 , G10L15/187 , G10L15/19

Abstract: A method includes receiving audio features and generating a latent speech representation based on the audio features. The method also includes generating a target quantized vector token and a target token index for a corresponding latent speech representation. The method also includes generating a contrastive context vector for a corresponding unmasked or masked latent speech representation and deriving a contrastive self-supervised loss based on the corresponding contrastive context vector and the corresponding target quantized vector token. The method also include generating a high-level context vector based on the contrastive context vector and, for each high-level context vector, learning to predict the target token index at the corresponding time step using a cross-entropy loss based on the target token index. The method also includes predicting speech recognition hypotheses for the utterance and training a multilingual automatic speech recognition (ASR) model using an unsupervised loss and a supervised loss.

22.

发明申请
QUANTIZATION AND SPARSITY AWARE FINE-TUNING FOR SPEECH RECOGNITION WITH UNIVERSAL SPEECH MODELS 有权

公开(公告)号：US20250078815A1

公开(公告)日：2025-03-06

申请号：US18826135

申请日：2024-09-05

Applicant: Google LLC

Inventor： Shaojin Ding , David Qiu , David Rim , Amir Yazdanbakhsh , Yanzhang He , Zhonglin Han , Rohit Prakash Prabhavalkar , Weiran Wang , Bo Li , Jian Li , Tara N. Sainath , Shivani Agrawal , Oleg Rybakov

IPC: G10L15/06

Abstract: A method includes obtaining a plurality of training samples that each include a respective speech utterance and a respective textual utterance representing a transcription of the respective speech utterance. The method also includes fine-tuning, using quantization and sparsity aware training with native integer operations, a pre-trained automatic speech recognition (ASR) model on the plurality of training samples. Here, the pre-trained ASR model includes a plurality of weights and the fine-tuning includes pruning one or more weights of the plurality of weights using a sparsity mask and quantizing each weight of the plurality of weights based on an integer with a fixed-bit width. The method also includes providing the fine-tuned ASR model to a user device.

23.

发明授权
Optimizing inference performance for conformer 有权

公开(公告)号：US12190869B2

公开(公告)日：2025-01-07

申请号：US17936547

申请日：2022-09-29

Applicant: Google LLC

Inventor： Tara N. Sainath , Rami Botros , Anmol Gulati , Krzysztof Choromanski , Ruoming Pang , Trevor Strohman , Weiran Wang , Jiahui Yu

IPC: G10L15/16 , G10L15/06 , G10L15/22

Abstract: A computer-implemented method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. Here, the ASR model includes a causal encoder and a decoder. The method also includes generating, by the causal encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by the decoder, a first probability distribution over possible speech recognition hypotheses. Here, the causal encoder includes a stack of causal encoder layers each including a Recurrent Neural Network (RNN) Attention-Performer module that applies linear attention.

24.

发明申请
CONTEXTUAL BIASING FOR SPEECH RECOGNITION 有权

公开(公告)号：US20240379095A1

公开(公告)日：2024-11-14

申请号：US18782001

申请日：2024-07-23

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Golan Pundak , Tara N. Sainath

IPC: G10L15/16 , G10L15/26

Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.

25.

发明公开
Emitting Word Timings with End-to-End Models 审中-公开

公开(公告)号：US20240321263A1

公开(公告)日：2024-09-26

申请号：US18680797

申请日：2024-05-31

Applicant: Google LLC

Inventor： Tara N. Sainath , Basilio Garcia Castillo , David Rybach , Trevor Strohman , Ruoming Pang

IPC: G10L15/06 , G10L25/30 , G10L25/78

CPC classification number: G10L15/063 , G10L25/30 , G10L25/78

Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

26.

发明公开
MIXTURE-OF-EXPERT CONFORMER FOR STREAMING MULTILINGUAL ASR 审中-公开

公开(公告)号：US20240304185A1

公开(公告)日：2024-09-12

申请号：US18598885

申请日：2024-03-07

Applicant: Google LLC

Inventor： Ke Hu , Bo Li , Tara N. Sainath , Yu Zhang , Francoise Beaufays

IPC: G10L15/197 , G10L15/02 , G10L15/06

CPC classification number: G10L15/197 , G10L15/02 , G10L15/063

Abstract: A method of a multilingual ASR model includes receiving a sequence of acoustic frames characterizing an utterance of speech. At a plurality of output steps, the method further includes generating a first higher order feature representation for an acoustic frame by a first encoder that includes a first plurality of multi-head attention layers; generating a second higher order feature representation for a corresponding first higher order feature representation by a second encoder that includes a second plurality of multi-head attention layers; and generating, by a first decoder, a first probability distribution over possible speech recognition hypotheses based on the second higher order feature representation and a sequence of N previous non-blank symbols. A gating layer of each respective MoE layer configured to dynamically route an output from a previous multi-head attention layer at each of the plurality of output steps to a respective pair of feed-forward expert networks.

27.

发明公开
CONNECTING DIFFERENT ASR APPLICATION DOMAINS WITH SPEAKER-TAGS 审中-公开

公开(公告)号：US20240304181A1

公开(公告)日：2024-09-12

申请号：US18598523

申请日：2024-03-07

Applicant: Google LLC

Inventor： Guru Prakash Arumugam , Shuo-yiin Chang , Shaan Jagdeep Patrick Bijwadia , Weiran Wang , Quan Wang , Rohit Prakash Prabhavalkar , Tara N. Sainath

IPC: G10L15/06

CPC classification number: G10L15/063

Abstract: A method includes receiving a plurality of training samples spanning multiple different domains. Each corresponding training sample includes audio data characterizing an utterance paired with a corresponding transcription of the utterance. The method also includes re-labeling each corresponding training sample of the plurality of training samples by annotating the corresponding transcription of the utterance with one or more speaker tags. Each speaker tag indicates a respective segment of the transcription for speech that was spoken by a particular type of speaker. The method also includes training a multi-domain speech recognition model on the re-labeled training samples to teach the multi-domain speech recognition model to learn to share parameters for recognizing speech across each of the different multiple different domains.

28.

发明公开
PROPER NOUN RECOGNITION IN END-TO-END SPEECH RECOGNITION 审中-公开

公开(公告)号：US20230377564A1

公开(公告)日：2023-11-23

申请号：US18362273

申请日：2023-07-31

Applicant: Google LLC

Inventor： Charles Caleb Peyser , Tara N. Sainath , Golan Pundak

IPC: G10L15/06 , G06N3/049 , G10L15/16 , G10L15/18 , G10L15/187

CPC classification number: G10L15/063 , G06N3/049 , G10L15/16 , G10L15/1815 , G10L15/187

Abstract: A method for training a speech recognition model with a minimum word error rate loss function includes receiving a training example comprising a proper noun and generating a plurality of hypotheses corresponding to the training example. Each hypothesis of the plurality of hypotheses represents the proper noun and includes a corresponding probability that indicates a likelihood that the hypothesis represents the proper noun. The method also includes determining that the corresponding probability associated with one of the plurality of hypotheses satisfies a penalty criteria. The penalty criteria indicating that the corresponding probability satisfies a probability threshold, and the associated hypothesis incorrectly represents the proper noun. The method also includes applying a penalty to the minimum word error rate loss function.

29.

发明公开
Streaming End-to-end Multilingual Speech Recognition with Joint Language Identification 审中-公开

公开(公告)号：US20230306958A1

公开(公告)日：2023-09-28

申请号：US18188632

申请日：2023-03-23

Applicant: Google LLC

Inventor： Chao Zhang , Bo Li , Tara N. Sainath , Trevor Strohman , Sepand Mavandadi , Shuo-yiin Chang , Parisa Haghani

IPC: G10L15/00 , G10L15/16 , G10L15/06

CPC classification number: G10L15/005 , G10L15/16 , G10L15/063

Abstract: A method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. The method also includes generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a language identification (ID) predictor, a language prediction representation based on a concatenation of the first higher order feature representation and the second higher order feature representation. The method also includes generating, by a first decoder, a first probability distribution over possible speech recognition hypotheses based on a concatenation of the second higher order feature representation and the language prediction representation.

30.

发明公开
Contextual Biasing for Speech Recognition 审中-公开

公开(公告)号：US20230274736A1

公开(公告)日：2023-08-31

申请号：US18311964

申请日：2023-05-04

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Golan Pundak , Tara N. Sainath , Antoine Jean Bruguier

IPC: G10L15/187 , G06N20/10 , G10L19/04

CPC classification number: G10L15/187 , G06N20/10 , G10L19/04 , G10L2015/088

Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification