Patent search ap:("Google LLC") AND inv:"Tara N. Sainath" Page 9

81.

发明授权
Speech recognition with sequence-to-sequence models 有权

公开(公告)号：US11145293B2

公开(公告)日：2021-10-12

申请号：US16516390

申请日：2019-07-19

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-Cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A. U. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen

IPC: G10L15/00 , G10L15/16 , G10L15/22 , G10L15/02 , G06N3/08 , G10L15/06 , G10L25/30 , G10L15/26

Abstract: Methods, systems, and apparatus, including computer-readable media, for performing speech recognition using sequence-to-sequence models. An automated speech recognition (ASR) system receives audio data for an utterance and provides features indicative of acoustic characteristics of the utterance as input to an encoder. The system processes an output of the encoder using an attender to generate a context vector and generates speech recognition scores using the context vector and a decoder trained using a training process that selects at least one input to the decoder with a predetermined probability. An input to the decoder during training is selected between input data based on a known value for an element in a training example, and input data based on an output of the decoder for the element in the training example. A transcription is generated for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

82.

发明申请
ENHANCED MULTI-CHANNEL ACOUSTIC MODELS 有权

公开(公告)号：US20210295859A1

公开(公告)日：2021-09-23

申请号：US17303822

申请日：2021-06-08

Applicant: Google LLC

Inventor： Ehsan Variani , Kevin William Wilson , Ron J. Weiss , Tara N. Sainath , Arun Narayanan

IPC: G10L25/30 , G10L21/028 , G10L21/0388 , G10L15/16 , G10L19/008 , G10L15/20

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

83.

发明申请
Deliberation Model-Based Two-Pass End-To-End Speech Recognition 有权

公开(公告)号：US20210225369A1

公开(公告)日：2021-07-22

申请号：US17149018

申请日：2021-01-14

Applicant: Google LLC

Inventor： Ke Hu , Tara N. Sainath , Ruoming Pang , Rohit Prakash Prabhavalkar

IPC: G10L15/18 , G10L15/06 , G06N3/04 , G10L15/187 , G10L15/16 , G10L19/00

Abstract: A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector. The method also includes decoding the first context vector and the second context vector at a context vector decoder to form a second-pass hypothesis

84.

发明授权
Linear transformation for speech recognition modeling 有权

公开(公告)号：US10714078B2

公开(公告)日：2020-07-14

申请号：US16171629

申请日：2018-10-26

Applicant: Google LLC

Inventor： Samuel Bengio , Mirkó Visontai , Christopher Walter George Thornton , Michiel A. U. Bacchiani , Tara N. Sainath , Ehsan Variani , Izhak Shafran

IPC: G10L15/16 , G10L15/02 , G10H1/00 , G10L19/02 , G10L17/18

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.

85.

发明授权
Compressed recurrent neural network models 有权

公开(公告)号：US10515307B2

公开(公告)日：2019-12-24

申请号：US15172457

申请日：2016-06-03

Applicant: Google LLC

Inventor： Tara N. Sainath , Vikas Sindhwani

IPC: G06F15/18 , G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing long-short term memory layers with compressed gating functions. One of the systems includes a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix. The gate parameter matrix for at least one of the plurality of gates is a structured matrix or is defined by a compressed parameter matrix and a projection matrix.

86.

发明授权
Fusion of acoustic and text representations in RNN-T 有权

公开(公告)号：US12211509B2

公开(公告)日：2025-01-28

申请号：US17821160

申请日：2022-08-19

Applicant: Google LLC

Inventor： Chao Zhang , Bo Li , Zhiyun Lu , Tara N. Sainath , Shuo-yiin Chang

IPC: G10L15/30 , G06N7/01

Abstract: A speech recognition model includes an encoder network, a prediction network, and a joint network. The encoder network is configured to receive a sequence of acoustic frames characterizing an input utterance; and generate, at each of a plurality of output steps, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The prediction network is configured to: receive a sequence of non-blank symbols output by a final Softmax layer; and generate, at each of the plurality of output steps, a dense representation. The joint network is configured to generate, at each of the plurality of output steps based on the higher order feature representation and the dense representation, a probability distribution over possible speech recognition hypotheses. The joint network includes a stack of gating and bilinear pooling to fuse the dense representation and the higher order feature representation.

87.

发明申请
SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS 有权

公开(公告)号：US20240420686A1

公开(公告)日：2024-12-19

申请号：US18815200

申请日：2024-08-26

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-Cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A. U. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen

IPC: G10L15/16 , G06N3/08 , G10L15/02 , G10L15/06 , G10L15/22 , G10L15/26 , G10L25/30

Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

88.

发明授权
Transducer-based streaming deliberation for cascaded encoders 有权

公开(公告)号：US12118988B2

公开(公告)日：2024-10-15

申请号：US17933307

申请日：2022-09-19

Applicant: Google LLC

Inventor： Ke Hu , Tara N. Sainath , Arun Narayanan , Ruoming Pang , Trevor Strohman

IPC: G10L15/197 , G06F40/126 , G10L15/02 , G10L15/06 , G10L15/08 , G10L15/22

CPC classification number: G10L15/197 , G06F40/126 , G10L15/02 , G10L15/063 , G10L15/083 , G10L15/22

Abstract: A method includes receiving a sequence of acoustic frames and generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by a first pass transducer decoder, a first pass speech recognition hypothesis for a corresponding first higher order feature representation and generating, by a text encoder, a text encoding for a corresponding first pass speech recognition hypothesis. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a second pass transducer decoder, a second pass speech recognition hypothesis using a corresponding second higher order feature representation and a corresponding text encoding.

89.

发明公开
Text Injection For Training Auxiliary Tasks In Speech Recognition Models 审中-公开

公开(公告)号：US20240296840A1

公开(公告)日：2024-09-05

申请号：US18592590

申请日：2024-03-01

Applicant: Google LLC

Inventor： Shaan Jagdeep Patrick Bijwadia , Shuo-yiin Chang , Tara N. Sainath , Weiran Wang , Zhong Meng

IPC: G10L15/197 , G10L15/02 , G10L15/06

CPC classification number: G10L15/197 , G10L15/02 , G10L15/063

Abstract: A joint auxiliary task and ASR model includes an encoder to receive a sequence of acoustic frames and generate, at each of a plurality of output steps, a higher-order feature representation for a corresponding acoustic frame. The model also includes a multi-output HAT decoder to generate at each of the plurality of output steps a probability distribution over possible speech recognition hypotheses, and an indication of whether the output step corresponds to an auxiliary token associated with a particular auxiliary task. The model is trained by a JEIT training process based on: a paired training data set including paired audio data and transcriptions, the transcriptions annotated with ground-truth auxiliary tokens associated with the particular auxiliary task; and an unpaired training data set including textual utterances not paired with any corresponding audio data, the textual utterances annotated with the ground-truth auxiliary tokens associated with the particular auxiliary task.

90.

发明公开
Large-Scale Language Model Data Selection for Rare-Word Speech Recognition 审中-公开

公开(公告)号：US20240290323A1

公开(公告)日：2024-08-29

申请号：US18660655

申请日：2024-05-10

Applicant: Google LLC

Inventor： Wenqian Ronny Huang , Tara N. Sainath

IPC: G10L15/06 , G06N3/02 , G10L15/16 , G10L15/197 , G10L15/22

CPC classification number: G10L15/063 , G06N3/02 , G10L15/16 , G10L15/197 , G10L15/22

Abstract: A method of training a language model for rare-word speech recognition includes obtaining a set of training text samples, and obtaining a set of training utterances used for training a speech recognition model. Each training utterance in the plurality of training utterances includes audio data corresponding to an utterance and a corresponding transcription of the utterance. The method also includes applying rare word filtering on the set of training text samples to identify a subset of rare-word training text samples that include words that do not appear in the transcriptions from the set of training utterances or appear in the transcriptions from the set of training utterances less than a threshold number of times. The method further includes training the external language model on the transcriptions from the set of training utterances and the identified subset of rare-word training text samples.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification