Patent search ap:("Google LLC") AND inv:"Tara N. Sainath" Page 5

41.

发明授权
Automatic speech recognition using multi-dimensional models 有权

公开(公告)号：US09984683B2

公开(公告)日：2018-05-29

申请号：US15217457

申请日：2016-07-22

Applicant: Google LLC

Inventor： Bo Li , Tara N. Sainath

IPC: G10L15/16 , G10L15/02 , G06N3/08 , G10L15/26

CPC classification number: G10L15/16 , G06N3/08 , G10L15/02 , G10L15/26 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for automatic speech recognition using multi-dimensional models. In some implementations, audio data that describes an utterance is received. A transcription for the utterance is determined using an acoustic model that includes a neural network having first memory blocks for time information and second memory blocks for frequency information. The transcription for the utterance is provided as output of an automated speech recognizer.

42.

发明申请
Disfluency Detection Models for Natural Conversational Voice Systems 有权

公开(公告)号：US20250140239A1

公开(公告)日：2025-05-01

申请号：US19010299

申请日：2025-01-06

Applicant: Google LLC

Inventor： Shuo-yiin Chang , Bo Li , Tara N. Sainath , Trevor Strohman , Chao Zhang

IPC: G10L15/06 , G10L15/08

Abstract: A method includes receiving a sequence of acoustic frames characterizing one or more utterances. At each of a plurality of output steps, the method also includes generating, by an encoder network of a speech recognition model, a higher order feature representation for a corresponding acoustic frame of the sequence of acoustic frames, generating, by a prediction network of the speech recognition model, a hidden representation for a corresponding sequence of non-blank symbols output by a final softmax layer of the speech recognition model, and generating, by a first joint network of the speech recognition model that receives the higher order feature representation generated by the encoder network and the dense representation generated by the prediction network, a probability distribution that the corresponding time step corresponds to a pause and an end of speech.

43.

发明申请
MULTILINGUAL AND CODE-SWITCHING ASR USING LARGE LANGUAGE MODEL GENERATED TEXT 有权

公开(公告)号：US20250095637A1

公开(公告)日：2025-03-20

申请号：US18886581

申请日：2024-09-16

Applicant: Google LLC

Inventor： Ke Hu , Tara N. Sainath , Bo Li , Yu Zhang , Yong Cheng , Tao Wang , Yujing Zhang , Frederick Liu

IPC: G10L15/06 , G10L15/00

Abstract: A method includes receiving a textual prompt in a first language and obtaining a fine-tuned prompt embedding configured to guide a large language model (LLM) to generate text in a target language from textual prompts in the first language. The method also includes processing, using the LLM, the textual prompt conditioned on the fine-tuned prompt embedding to generate output text in the target language and concatenating the textual prompt and the generated output text to provide an unspoken textual utterance. The method also includes training a multilingual automatic speech recognition (ASR) model to learn how to recognize speech in the target language by injecting the unspoken textual utterance into a text encoder associated with the multilingual ASR model.

44.

发明申请
Adapter Finetuning with Teacher Pseudo-Labeling for Tail Languages in Streaming Multilingual ASR 有权

公开(公告)号：US20250078830A1

公开(公告)日：2025-03-06

申请号：US18826743

申请日：2024-09-06

Applicant: Google LLC

Inventor： Junwen Bai , Bo Li , Qiujia Li , Tara N. Sainath , Trevor Strohman

IPC: G10L15/197 , G10L15/00 , G10L15/02 , G10L15/06 , G10L15/30

Abstract: A method includes receiving a sequence of acoustic frames characterizing a spoken utterance in a particular native language. The method also includes generating a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames by a causal encoder that includes an initial stack of multi-head attention layers. The method also includes generating a second higher order feature representation for a corresponding first higher order feature representation by a non-causal encoder that includes a final stack of multi-head attention layers. The method also includes receiving, as input at each corresponding language-dependent adapter (LDA) module, a language ID vector identifying the particular native language to activate corresponding language-dependent weights specific to the particular native language. The method also includes generating a first probability distribution over possible speech recognition hypotheses by a decoder.

45.

发明申请
Transducer-Based Streaming Deliberation for Cascaded Encoders 有权

公开(公告)号：US20240428786A1

公开(公告)日：2024-12-26

申请号：US18826655

申请日：2024-09-06

Applicant: Google LLC

Inventor： Ke Hu , Tara N. Sainath , Arun Narayanan , Ruoming Pang , Trevor Strohman

IPC: G10L15/197 , G06F40/126 , G10L15/02 , G10L15/06 , G10L15/08 , G10L15/22

Abstract: A method includes receiving a sequence of acoustic frames and generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by a first pass transducer decoder, a first pass speech recognition hypothesis for a corresponding first higher order feature representation and generating, by a text encoder, a text encoding for a corresponding first pass speech recognition hypothesis. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a second pass transducer decoder, a second pass speech recognition hypothesis using a corresponding second higher order feature representation and a corresponding text encoding.

46.

发明公开
Semantic Segmentation With Language Models For Long-Form Automatic Speech Recognition 审中-公开

公开(公告)号：US20240290320A1

公开(公告)日：2024-08-29

申请号：US18585020

申请日：2024-02-22

Applicant: Google LLC

Inventor： Wenqian Huang , Hao Zhang , Shankar Kumar , Shuo-yiin Chang , Tara N. Sainath

IPC: G10L15/06 , G06F40/30 , G10L15/26

CPC classification number: G10L15/063 , G06F40/30 , G10L15/26

Abstract: A joint segmenting and ASR model includes an encoder to receive a sequence of acoustic frames and generate, at each of a plurality of output steps, a higher order feature representation for a corresponding acoustic frame. The model also includes a decoder to generate based on the higher order feature representation at each of the plurality of output steps a probability distribution over possible speech recognition hypotheses, and an indication of whether the corresponding output step corresponds to an end of segment (EOS). The model is trained on a set of training samples, each training sample including audio data characterizing multiple segments of long-form speech; and a corresponding transcription of the long-form speech, the corresponding transcription annotated with ground-truth EOS labels obtained via distillation from a language model teacher that receives the corresponding transcription as input and injects the ground-truth EOS labels into the corresponding transcription between semantically complete segments.

47.

发明授权
Contextual biasing for speech recognition 有权

公开(公告)号：US12051407B2

公开(公告)日：2024-07-30

申请号：US17815049

申请日：2022-07-26

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Golan Pundak , Tara N. Sainath

IPC: G10L15/16 , G10L15/26

CPC classification number: G10L15/16 , G10L15/26

Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.

48.

发明公开
Joint Speech and Text Streaming Model for ASR 审中-公开

公开(公告)号：US20240028829A1

公开(公告)日：2024-01-25

申请号：US18346232

申请日：2023-07-01

Applicant: Google LLC

Inventor： Tara N. Sainath , Zhouyuan Huo , Zhehuai Chen , Yu Zhang , Weiran Wang , Trevor Strohman , Rohit Prakash Prabhavalkar , Bo Li , Ankur Bapna

IPC: G06F40/284 , G06F40/40

CPC classification number: G06F40/284 , G06F40/40

Abstract: A method includes receiving training data that includes a set of unspoken textual utterances. For each respective unspoken textual utterance, the method includes, tokenizing the respective textual utterance into a sequence of sub-word units, generating a first higher order textual feature representation for a corresponding sub-word unit tokenized from the respective unspoken textual utterance, receiving the first higher order textual feature representation generated by a text encoder, and generating a first probability distribution over possible text units. The method also includes training an encoder based on the first probability distribution over possible text units generated by a first-pass decoder for each respective unspoken textual utterance in the set of unspoken textual utterances.

49.

发明公开
Rare Word Recognition with LM-aware MWER Training 审中-公开

公开(公告)号：US20230298570A1

公开(公告)日：2023-09-21

申请号：US18187222

申请日：2023-03-21

Applicant: Google LLC

Inventor： Weiran Wang , Tongzhou Chen , Tara N. Sainath , Ehsan Variani , Rohit Prakash Prabhavalkar , Ronny Huang , Bhuvana Ramabhadran , Neeraj Gaur , Sepand Mavandadi , Charles Caleb Peyser , Trevor Strohman , Yangzhang He , David Rybach

IPC: G10L15/06 , G10L15/19 , G10L15/22 , G10L15/16 , G10L15/02

CPC classification number: G10L15/063 , G10L15/19 , G10L15/22 , G10L15/16 , G10L15/02

Abstract: A method includes generating, using an audio encoder, a higher-order feature representation for each acoustic frame in a sequence of acoustic frames; generating, using a decoder, based on the higher-order feature representation, a plurality of speech recognition hypotheses, each hypotheses corresponding to a candidate transcription of an utterance and having an associated first likelihood score; generating, using an external language model, for each speech recognition hypothesis, a second likelihood score; determining, using a learnable fusion module, for each speech recognition hypothesis, a set of fusion weights based on the higher-order feature representation and the speech recognition hypothesis; and generating, using the learnable fusion module, for each speech recognition hypothesis, a third likelihood score based on the first likelihood score, the second likelihood score, and the set of fusion weights, the audio encoder and decoder trained using minimum additive error rate training in the presence of the external language model.

50.

发明授权
Adaptive audio enhancement for multichannel speech recognition 有权

公开(公告)号：US11756534B2

公开(公告)日：2023-09-12

申请号：US17649058

申请日：2022-01-26

Applicant: Google LLC

Inventor： Bo Li , Ron Weiss , Michiel A. U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC: G10L15/00 , G10L15/16 , G10L15/20 , G10L21/0224 , G10L15/26 , G10L21/0216

CPC classification number: G10L15/16 , G10L15/20 , G10L21/0224 , G10L15/26 , G10L2021/02166

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification