Patent search ap:("GOOGLE LLC") AND inv:"Tara N. Sainath" Page 10

91.

发明授权
Deliberation model-based two-pass end-to-end speech recognition 有权

公开(公告)号：US12027158B2

公开(公告)日：2024-07-02

申请号：US18164923

申请日：2023-02-06

Applicant: Google LLC

Inventor： Ke Hu , Tara N. Sainath , Ruoming Pang , Rohit Prakash Prabhavalkar

IPC: G10L15/18 , G06N3/049 , G10L15/06 , G10L15/16 , G10L15/187 , G10L19/00

CPC classification number: G10L15/1815 , G06N3/049 , G10L15/063 , G10L15/16 , G10L15/187 , G10L19/0018

Abstract: A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector. The method also includes decoding the first context vector and the second context vector at a context vector decoder to form a second-pass hypothesis.

92.

发明公开
PARAMETER-EFFICIENT MODEL REPROGRAMMING FOR CROSS-LINGUAL SPEECH RECOGNITION 审中-公开

公开(公告)号：US20240185841A1

公开(公告)日：2024-06-06

申请号：US18490808

申请日：2023-10-20

Applicant: Google LLC

Inventor： Bo Li , Yu Zhang , Nanxin Chen , Rohit Prakash Prabhavalkar , Chao-Han Huck Yang , Tara N. Sainath , Trevor Strohman

IPC: G10L15/065 , G10L15/00

CPC classification number: G10L15/065 , G10L15/005

Abstract: A method includes obtaining an ASR model trained to recognize speech in a first language and receiving transcribed training utterances in a second language. The method also includes integrating the ASR model with an input reprogramming module and a latent reprogramming module. The method also includes adapting the ASR model to learn how to recognize speech in the second language by training the input reprogramming module and the latent reprogramming module while parameters of the ASR model are frozen.

93.

发明公开
End-To-End Segmentation in a Two-Pass Cascaded Encoder Automatic Speech Recognition Model 审中-公开

公开(公告)号：US20240169981A1

公开(公告)日：2024-05-23

申请号：US18512110

申请日：2023-11-17

Applicant: Google LLC

Inventor： Wenqian Ronny Huang , Shuo-yiin Chang , Tara N. Sainath , Yanzhang He

IPC: G10L15/197 , G10L15/02 , G10L15/05 , G10L15/06 , G10L15/16

CPC classification number: G10L15/197 , G10L15/02 , G10L15/05 , G10L15/063 , G10L15/16 , G10L2015/025 , G10L15/22

Abstract: A unified end-to-end segmenter and two-pass automatic speech recognition (ASR) model includes a first encoder, a first decoder, a second encoder, and a second decoder. The first encoder is configured to receive a sequence of acoustic frames and generate a first higher order feature representation. The first decoder is configured to receive the first higher order feature representation and generate, at each of a plurality of output steps, a first probability distribution and an indication of whether the output step corresponds to an end of speech segment, and emit an end of speech timestamp. The second encoder is configured to receive the first higher order feature representation and the end of speech timestamp, and generate a second higher order feature representation. The second decoder is configured to receive the second higher order feature representation and generate a second probability distribution.

94.

发明公开
Contextual Biasing With Text Injection 审中-公开

公开(公告)号：US20240153498A1

公开(公告)日：2024-05-09

申请号：US18490861

申请日：2023-10-20

Applicant: Google LLC

Inventor： Tara N. Sainath , Rohit Prakash Prabhavalkar , Diamantino Antonio Caseiro , Patrick Maxim Rondon , Cyril Allauzen

IPC: G10L15/16 , G10L15/06 , G10L15/183

CPC classification number: G10L15/16 , G10L15/063 , G10L15/183

Abstract: A method includes receiving context biasing data that includes a set of unspoken textual utterances corresponding to a particular context. The method also includes obtaining a list of carrier phrases associated with the particular context. For each respective unspoken textual utterance, the method includes generating a corresponding training data pair that includes the respective unspoken textual utterance and a carrier phrase. For each respective training data pair, the method includes tokenizing the respective training data pair into a sequence of sub-word units, generating a first higher order textual feature representation for a corresponding sub-word unit, receiving the first higher order textual feature representation, and generating a first probability distribution over possible text units. The method also includes training a speech recognition model based on the first probability distribution over possible text units.

95.

发明公开
EXPORTING MODULAR ENCODER FEATURES FOR STREAMING AND DELIBERATION ASR 审中-公开

公开(公告)号：US20240144917A1

公开(公告)日：2024-05-02

申请号：US18494763

申请日：2023-10-25

Applicant: Google LLC

Inventor： Rami Magdi Fahmi Botros , Rohit Prakash Prabhavalkar , Johan Schalkwyk , Tara N. Sainath , Ciprian Ioan Chelba , Francoise Beaufays

IPC: G10L15/16

CPC classification number: G10L15/16

Abstract: A method includes obtaining a base encoder from a pre-trained model, and receiving training data comprising a sequence of acoustic frames characterizing an utterance paired with a ground-truth transcription of the utterance. At each of a plurality of output steps, the method includes: generating, by the base encoder, a first encoded representation for a corresponding acoustic frame; generating, by an exporter network configured to receive a continuous sequence of first encoded representations generated by the base encoder, a second encoded representation for a corresponding acoustic frame; generating, by an exporter decoder, a probability distribution over possible logits; and determining an exporter decoder loss based on the probability distribution over possible logits generated by the exporter decoder at the corresponding output step and the ground-truth transcription. The method also includes training the exporter network based on the exporter decoder losses while parameters of the base encoder are frozen.

96.

发明授权
Multi-dialect and multilingual speech recognition 有权

公开(公告)号：US11900915B2

公开(公告)日：2024-02-13

申请号：US17572238

申请日：2022-01-10

Applicant: Google LLC

Inventor： Zhifeng Chen , Bo Li , Eugene Weinstein , Yonghui Wu , Pedro J. Moreno Mengibar , Ron J. Weiss , Khe Chai Sim , Tara N. Sainath , Patrick An Phu Nguyen

IPC: G10L15/00 , G10L15/16 , G10L15/07 , G10L15/06

CPC classification number: G10L15/005 , G10L15/07 , G10L15/16 , G10L2015/0631

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.

97.

发明授权
Enhanced multi-channel acoustic models 有权

公开(公告)号：US11783849B2

公开(公告)日：2023-10-10

申请号：US17303822

申请日：2021-06-08

Applicant: Google LLC

Inventor： Ehsan Variani , Kevin William Wilson , Ron J. Weiss , Tara N. Sainath , Arun Narayanan

IPC: G10L15/16 , G10L25/30 , G10L21/028 , G10L21/0388 , G10L19/008 , G10L15/20 , G10L21/0208 , G10L21/0216

CPC classification number: G10L25/30 , G10L15/16 , G10L15/20 , G10L19/008 , G10L21/028 , G10L21/0388 , G10L2021/02087 , G10L2021/02166

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

98.

发明公开
Deliberation Model-Based Two-Pass End-To-End Speech Recognition 审中-公开

公开(公告)号：US20230186907A1

公开(公告)日：2023-06-15

申请号：US18164923

申请日：2023-02-06

Applicant: Google LLC

Inventor： Ke Hu , Tara N. Sainath , Ruoming Pang , Rohit Prakash Prabhavalkar

IPC: G10L15/18 , G06N3/049 , G10L15/06 , G10L15/16 , G10L15/187 , G10L19/00

CPC classification number: G10L15/1815 , G06N3/049 , G10L15/063 , G10L15/16 , G10L15/187 , G10L19/0018

Abstract: A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector. The method also includes decoding the first context vector and the second context vector at a context vector decoder to form a second-pass hypothesis

99.

发明公开
Attention-Based Joint Acoustic and Text On-Device End-to-End Model 审中-公开

公开(公告)号：US20230186901A1

公开(公告)日：2023-06-15

申请号：US18167454

申请日：2023-02-10

Applicant: Google LLC

Inventor： Tara N. Sainath , Ruoming Pang , Ron Weiss , Yanzhang He , Chung-Cheng Chiu , Trevor Strohman

IPC: G10L15/06 , G06N3/08 , G10L15/16 , G10L15/197

CPC classification number: G10L15/063 , G06N3/08 , G10L15/16 , G10L15/197 , G10L2015/0635

Abstract: A method includes receiving a training example for a listen-attend-spell (LAS) decoder of a two-pass streaming neural network model and determining whether the training example corresponds to a supervised audio-text pair or an unpaired text sequence. When the training example corresponds to an unpaired text sequence, the method also includes determining a cross entropy loss based on a log probability associated with a context vector of the training example. The method also includes updating the LAS decoder and the context vector based on the determined cross entropy loss.

100.

发明申请
Optimizing Inference Performance for Conformer 有权

公开(公告)号：US20230130634A1

公开(公告)日：2023-04-27

申请号：US17936547

申请日：2022-09-29

Applicant: Google LLC

Inventor： Tara N. Sainath , Rami Botros , Anmol Gulati , Krzysztof Choromanski , Ruoming Pang , Trevor Strohman , Weiran Wang , Jiahui Yu

IPC: G10L15/16 , G10L15/22 , G10L15/06

Abstract: A computer-implemented method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. Here, the ASR model includes a causal encoder and a decoder. The method also includes generating, by the causal encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by the decoder, a first probability distribution over possible speech recognition hypotheses. Here, the causal encoder includes a stack of causal encoder layers each including a Recurrent Neural Network (RNN) Attention-Performer module that applies linear attention.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification