Patent search ap:("GOOGLE LLC") AND inv:"Tara N. Sainath" Page 6

51.

发明授权
Compressed recurrent neural network models 有权

公开(公告)号：US11741366B2

公开(公告)日：2023-08-29

申请号：US16726119

申请日：2019-12-23

Applicant: Google LLC

Inventor： Tara N. Sainath , Vikas Sindhwani

IPC: G06N20/00 , G06N3/084 , G06N3/044

CPC classification number: G06N3/084 , G06N3/044

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing long-short term memory layers with compressed gating functions. One of the systems includes a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix. The gate parameter matrix for at least one of the plurality of gates is a structured matrix or is defined by a compressed parameter matrix and a projection matrix.

52.

发明授权
Minimum word error rate training for attention-based sequence-to-sequence models 有权

公开(公告)号：US11646019B2

公开(公告)日：2023-05-09

申请号：US17443557

申请日：2021-07-27

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Tara N. Sainath , Yonghui Wu , Patrick An Phu Nguyen , Zhifeng Chen , Chung-Cheng Chiu , Anjuli Patricia Kannan

IPC: G10L15/197 , G10L15/16 , G10L15/06 , G10L15/02 , G10L15/22

CPC classification number: G10L15/197 , G10L15/02 , G10L15/063 , G10L15/16 , G10L15/22 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.

53.

发明申请
TWO-PASS END TO END SPEECH RECOGNITION 有权

公开(公告)号：US20220310072A1

公开(公告)日：2022-09-29

申请号：US17616129

申请日：2020-06-03

Applicant: GOOGLE LLC

Inventor： Tara N. Sainath , Ruoming Pang , David Rybach , Yanzhang He , Rohit Prabhavalkar , Wei Li , Mirkó Visontai , Qiao Liang , Trevor Strohman , Yonghui Wu , Ian C. McGraw , Chung-Cheng Chiu

IPC: G10L15/16 , G10L15/32 , G10L15/05

Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

54.

发明申请
Contextual Biasing for Speech Recognition 有权

公开(公告)号：US20220101836A1

公开(公告)日：2022-03-31

申请号：US17643423

申请日：2021-12-09

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Golan Pundak , Tara N. Sainath , Antoine Jean Bruguier

IPC: G10L15/187 , G06N20/10 , G10L19/04

Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

55.

发明申请
Proper Noun Recognition in End-to-End Speech Recognition 有权

公开(公告)号：US20210233512A1

公开(公告)日：2021-07-29

申请号：US17150491

申请日：2021-01-15

Applicant: Google LLC

Inventor： Charles Caleb Peyser , Tara N. Sainath , Golan Pundak

IPC: G10L15/06 , G06N3/04 , G10L15/16 , G10L15/18 , G10L15/187

Abstract: A method for training a speech recognition model with a minimum word error rate loss function includes receiving a training example comprising a proper noun and generating a plurality of hypotheses corresponding to the training example. Each hypothesis of the plurality of hypotheses represents the proper noun and includes a corresponding probability that indicates a likelihood that the hypothesis represents the proper noun. The method also includes determining that the corresponding probability associated with one of the plurality of hypotheses satisfies a penalty criteria. The penalty criteria indicating that the corresponding probability satisfies a probability threshold, and the associated hypothesis incorrectly represents the proper noun. The method also includes applying a penalty to the minimum word error rate loss function.

56.

发明授权
Processing audio waveforms 有权

公开(公告)号：US10930270B2

公开(公告)日：2021-02-23

申请号：US16541982

申请日：2019-08-15

Applicant: Google LLC

Inventor： Tara N. Sainath , Ron J. Weiss , Andrew W. Senior , Kevin William Wilson

IPC: G10L15/00 , G10L15/16 , G06N3/04 , G06N3/08 , G10L15/26 , G10L15/14

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.

57.

发明申请
PHONEME-BASED CONTEXTUALIZATION FOR CROSS-LINGUAL SPEECH RECOGNITION IN END-TO-END MODELS 审中-公开

公开(公告)号：US20200349923A1

公开(公告)日：2020-11-05

申请号：US16861190

申请日：2020-04-28

Applicant: Google LLC

Inventor： Ke Hu , Antoine Jean Bruguier , Tara N. Sainath , Rohit Prakash Prabhavalkar , Golan Pundak

IPC: G10L15/06 , G10L15/187 , G10L15/193 , G10L15/32 , G10L15/28 , G10L25/30 , G10L15/02

Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.

58.

发明申请
Joint Endpointing And Automatic Speech Recognition 审中-公开

公开(公告)号：US20200335091A1

公开(公告)日：2020-10-22

申请号：US16809403

申请日：2020-03-04

Applicant: Google LLC

Inventor： Shuo-yiin Chang , Rohit Prakash Prabhavalkar , Gabor Simko , Tara N. Sainath , Bo Li , Yangzhang He

IPC: G10L15/16 , G10L15/14 , G10L15/28 , G10L15/02

Abstract: A method includes receiving audio data of an utterance and processing the audio data to obtain, as output from a speech recognition model configured to jointly perform speech decoding and endpointing of utterances: partial speech recognition results for the utterance; and an endpoint indication indicating when the utterance has ended. While processing the audio data, the method also includes detecting, based on the endpoint indication, the end of the utterance. In response to detecting the end of the utterance, the method also includes terminating the processing of any subsequent audio data received after the end of the utterance was detected.

59.

发明授权
Convolutional, long short-term memory, fully connected deep neural networks 有权

公开(公告)号：US10783900B2

公开(公告)日：2020-09-22

申请号：US14847133

申请日：2015-09-08

Applicant: Google LLC

Inventor： Tara N. Sainath , Andrew W. Senior , Oriol Vinyals , Hasim Sak

IPC: G10L25/30 , G10L15/16 , G06N3/04 , G10L15/02

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying the language of a spoken utterance. One of the methods includes receiving input features of an utterance; and processing the input features using an acoustic model that comprises one or more convolutional neural network (CNN) layers, one or more long short-term memory network (LSTM) layers, and one or more fully connected neural network layers to generate a transcription for the utterance.

60.

发明申请
MINIMUM WORD ERROR RATE TRAINING FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS 审中-公开

公开(公告)号：US20200043483A1

公开(公告)日：2020-02-06

申请号：US16529252

申请日：2019-08-01

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Tara N. Sainath , Yonghui Wu , Patrick An Phu Nguyen , Zhifeng Chen , Chung-Cheng Chiu , Anjuli Patricia Kannan

IPC: G10L15/197 , G10L15/16 , G10L15/22 , G10L15/06 , G10L15/02

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification