Patent search ap:("GOOGLE LLC") AND inv:"Tara N. Sainath" Page 7

61.

发明申请
SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS 审中-公开

公开(公告)号：US20200027444A1

公开(公告)日：2020-01-23

申请号：US16516390

申请日：2019-07-19

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-Cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A.U. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen

IPC: G10L15/16 , G10L15/22 , G10L15/06 , G10L15/02 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer-readable media, for performing speech recognition using sequence-to-sequence models. An automated speech recognition (ASR) system receives audio data for an utterance and provides features indicative of acoustic characteristics of the utterance as input to an encoder. The system processes an output of the encoder using an attender to generate a context vector and generates speech recognition scores using the context vector and a decoder trained using a training process that selects at least one input to the decoder with a predetermined probability. An input to the decoder during training is selected between input data based on a known value for an element in a training example, and input data based on an output of the decoder for the element in the training example. A transcription is generated for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

62.

发明授权
Multichannel raw-waveform neural networks 有权

公开(公告)号：US10339921B2

公开(公告)日：2019-07-02

申请号：US14987146

申请日：2016-01-04

Applicant: Google LLC

Inventor： Tara N. Sainath , Ron J. Weiss , Kevin William Wilson

IPC: G10L15/00 , G10L15/16 , G10L15/34 , G06N3/04 , G06N3/08 , G10L15/20 , G10L21/0208

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using neural networks. One of the methods includes receiving, by a neural network in a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal, the first raw audio signal and the second raw audio signal for the same period of time, generating, by a spatial filtering convolutional layer in the neural network, a spatial filtered output the first data and the second data, generating, by a spectral filtering convolutional layer in the neural network, a spectral filtered output using the spatial filtered output, and processing, by one or more additional layers in the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

63.

发明授权
Voice activity detection 有权

公开(公告)号：US10229700B2

公开(公告)日：2019-03-12

申请号：US14986985

申请日：2016-01-04

Applicant: GOOGLE LLC

Inventor： Tara N. Sainath , Gabor Simko , Maria Carolina Parada San Martin , Ruben Zazo Candil

IPC: G10L15/16 , G10L25/30 , G10L25/78

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for detecting voice activity. In one aspect, a method include actions of receiving, by a neural network included in an automated voice activity detection system, a raw audio waveform, processing, by the neural network, the raw audio waveform to determine whether the audio waveform includes speech, and provide, by the neural network, a classification of the raw audio waveform indicating whether the raw audio waveform includes speech.

64.

发明申请
ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION 审中-公开

公开(公告)号：US20180197534A1

公开(公告)日：2018-07-12

申请号：US15848829

申请日：2017-12-20

Applicant: Google LLC

Inventor： Bo Li , Ron J. Weiss , Michiel A.U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC: G10L15/16 , G10L21/0224 , G10L15/26 , G10L21/0216

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

65.

发明授权
Language agnostic multilingual end-to-end streaming on-device ASR system 有权

公开(公告)号：US12183322B2

公开(公告)日：2024-12-31

申请号：US17934555

申请日：2022-09-22

Applicant: Google LLC

Inventor： Bo Li , Tara N. Sainath , Ruoming Pang , Shuo-yiin Chang , Qiumin Xu , Trevor Strohman , Vince Chen , Qiao Liang , Heguang Liu , Yanzhang He , Parisa Haghani , Sameer Bidichandani

IPC: G10L15/00 , G10L15/06 , G10L15/22 , G10L15/30

Abstract: A method includes receiving a sequence of acoustic frames characterizing one or more utterances as input to a multilingual automated speech recognition (ASR) model. The method also includes generating a higher order feature representation for a corresponding acoustic frame. The method also includes generating a hidden representation based on a sequence of non-blank symbols output by a final softmax layer. The method also includes generating a probability distribution over possible speech recognition hypotheses based on the hidden representation generated by the prediction network at each of the plurality of output steps and the higher order feature representation generated by the encoder at each of the plurality of output steps. The method also includes predicting an end of utterance (EOU) token at an end of each utterance. The method also includes classifying each acoustic frame as either speech, initial silence, intermediate silence, or final silence.

66.

发明申请
TWO-PASS END TO END SPEECH RECOGNITION 有权

公开(公告)号：US20240420687A1

公开(公告)日：2024-12-19

申请号：US18815537

申请日：2024-08-26

Applicant: GOOGLE LLC

Inventor： Tara N. Sainath , Yanzhang He , Bo Li , Arun Narayanan , Ruoming Pang , Antoine Jean Bruguier , Shuo-yiin Chang , Wei Li

IPC: G10L15/16 , G06N3/08 , G10L15/05 , G10L15/06 , G10L15/22

Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

67.

发明授权
Speech recognition with sequence-to-sequence models 有权

公开(公告)号：US12106749B2

公开(公告)日：2024-10-01

申请号：US17448119

申请日：2021-09-20

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A. u. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen

IPC: G10L15/00 , G06N3/08 , G10L15/02 , G10L15/06 , G10L15/16 , G10L15/22 , G10L25/30 , G10L15/26

CPC classification number: G10L15/16 , G06N3/08 , G10L15/02 , G10L15/063 , G10L15/22 , G10L25/30 , G10L2015/025 , G10L15/26

Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

68.

发明公开
Multi-Output Decoders for Multi-Task Learning of ASR and Auxiliary Tasks 审中-公开

公开(公告)号：US20240153495A1

公开(公告)日：2024-05-09

申请号：US18494984

申请日：2023-10-26

Applicant: Google LLC

Inventor： Weiran Wang , Ding Zhao , Shaojin Ding , Hao Zhang , Shuo-yiin Chang , David Johannes Rybach , Tara N. Sainath , Yanzhang He , Ian McGraw , Shankar Kumar

IPC: G10L15/06 , G06F40/284 , G10L15/26

CPC classification number: G10L15/063 , G06F40/284 , G10L15/26

Abstract: A method includes receiving a training dataset that includes one or more spoken training utterances for training an automatic speech recognition (ASR) model. Each spoken training utterance in the training dataset paired with a corresponding transcription and a corresponding target sequence of auxiliary tokens. For each spoken training utterance, the method includes generating a speech recognition hypothesis for a corresponding spoken training utterance, determining a speech recognition loss based on the speech recognition hypothesis and the corresponding transcription, generating a predicted auxiliary token for the corresponding spoken training utterance, and determining an auxiliary task loss based on the predicted auxiliary token and the corresponding target sequence of auxiliary tokens. The method also includes the ASR model jointly on the speech recognition loss and the auxiliary task loss determined for each spoken training utterance.

69.

发明公开
Universal Monolingual Output Layer for Multilingual Speech Recognition 审中-公开

公开(公告)号：US20240135923A1

公开(公告)日：2024-04-25

申请号：US18485271

申请日：2023-10-11

Applicant: Google LLC

Inventor： Chao Zhang , Bo Li , Tara N. Sainath , Trevor Strohman , Shuo-yiin Chang

IPC: G10L15/197 , G10L15/00 , G10L15/02

CPC classification number: G10L15/197 , G10L15/005 , G10L15/02

Abstract: A method includes receiving a sequence of acoustic frames as input to a multilingual automated speech recognition (ASR) model configured to recognize speech in a plurality of different supported languages and generating, by an audio encoder of the multilingual ASR, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by a language identification (LID) predictor of the multilingual ASR, a language prediction representation for a corresponding higher order feature representation. The method also includes generating, by a decoder of the multilingual ASR, a probability distribution over possible speech recognition results based on the corresponding higher order feature representation, a sequence of non-blank symbols, and a corresponding language prediction representation. The decoder includes monolingual output layer having a plurality of output nodes each sharing a plurality of language-specific wordpiece models.

70.

发明授权
Deliberation model-based two-pass end-to-end speech recognition 有权

公开(公告)号：US11908461B2

公开(公告)日：2024-02-20

申请号：US17149018

申请日：2021-01-14

Applicant: Google LLC

Inventor： Ke Hu , Tara N. Sainath , Ruoming Pang , Rohit Prakash Prabhavalkar

IPC: G10L15/18 , G06N3/049 , G10L15/06 , G10L15/16 , G10L15/187 , G10L19/00

CPC classification number: G10L15/1815 , G06N3/049 , G10L15/063 , G10L15/16 , G10L15/187 , G10L19/0018

Abstract: A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector. The method also includes decoding the first context vector and the second context vector at a context vector decoder to form a second-pass hypothesis.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification