Patent search ap:("Google LLC") AND inv:"Tara N. Sainath" Page 4

31.

发明授权
Convolutional, long short-term memory, fully connected deep neural networks 有权

公开(公告)号：US11715486B2

公开(公告)日：2023-08-01

申请号：US16731464

申请日：2019-12-31

Applicant: Google LLC

Inventor： Tara N. Sainath , Andrew W. Senior , Oriol Vinyals , Hasim Sak

IPC: G06N3/044 , G06N3/045 , G10L25/30 , G10L15/16 , G10L15/02

CPC classification number: G10L25/30 , G06N3/044 , G06N3/045 , G10L15/16 , G10L15/02

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying the language of a spoken utterance. One of the methods includes receiving input features of an utterance; and processing the input features using an acoustic model that comprises one or more convolutional neural network (CNN) layers, one or more long short-term memory network (LSTM) layers, and one or more fully connected neural network layers to generate a transcription for the utterance.

32.

发明公开
MINIMUM WORD ERROR RATE TRAINING FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS 审中-公开

公开(公告)号：US20230237995A1

公开(公告)日：2023-07-27

申请号：US18194586

申请日：2023-03-31

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Tara N. Sainath , Younghui Wu , Patrick An Phu Nguyen , Zhifeng Chen , Chung-Cheng Chiu , Anjuli Kannan

IPC: G10L15/197 , G10L15/16 , G10L15/06 , G10L15/02 , G10L15/22

CPC classification number: G10L15/197 , G10L15/16 , G10L15/063 , G10L15/02 , G10L15/22 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses a set of speech recognition hypothesis samples, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.

33.

发明公开
Systems and Methods for Training Dual-Mode Machine-Learned Speech Recognition Models 审中-公开

公开(公告)号：US20230237993A1

公开(公告)日：2023-07-27

申请号：US18011571

申请日：2021-10-01

Applicant: Google LLC

Inventor： Jiahui Yu , Ruoming Pang , Wei Han , Anmol Gulati , Chung-Cheng Chiu , Bo Li , Tara N. Sainath , Yonghui Hu

IPC: G10L15/16 , G10L15/32 , G10L15/22

CPC classification number: G10L15/16 , G10L15/32 , G10L15/22

Abstract: Systems and methods of the present disclosure are directed to a computing system, including one or more processors and a machine-learned multi-mode speech recognition model configured to operate in a streaming recognition mode or a contextual recognition mode. The computing system can perform operations including obtaining speech data and a ground truth label and processing the speech data using the contextual recognition mode to obtain contextual prediction data. The operations can include evaluating a difference between the contextual prediction data and the ground truth label and processing the speech data using the streaming recognition mode to obtain streaming prediction data. The operations can include evaluating a difference between the streaming prediction data and the ground truth label and the contextual and streaming prediction data. The operations can include adjusting parameters of the speech recognition model.

34.

发明授权
Contextual biasing for speech recognition 有权

公开(公告)号：US11664021B2

公开(公告)日：2023-05-30

申请号：US17643423

申请日：2021-12-09

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Golan Pundak , Tara N. Sainath , Antoine Jean Bruguier

IPC: G10L15/187 , G06N20/10 , G10L19/04 , G10L15/08

CPC classification number: G10L15/187 , G06N20/10 , G10L19/04 , G10L2015/088

Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

35.

发明申请
Lookup-Table Recurrent Language Model 有权

公开(公告)号：US20220310067A1

公开(公告)日：2022-09-29

申请号：US17650566

申请日：2022-02-10

Applicant: Google LLC

Inventor： Ronny Huang , Tara N. Sainath , Trevor Strohman , Shankar Kumar

IPC: G10L15/08 , G10L15/26 , G10L15/187 , G06N3/04 , G10L15/16

Abstract: A computer-implemented method includes receiving audio data that corresponds to an utterance spoken by a user and captured by a user device. The method also includes processing the audio data to determine a candidate transcription that includes a sequence of tokens for the spoken utterance. Tor each token in the sequence of tokens, the method includes determining a token embedding for corresponding token, determining a n-gram token embedding for a previous sequence of n-gram tokens, and concatenating the token embedding and the n-gram token embedding to generate a concatenated output for the corresponding token. The method also includes rescoring the candidate transcription for the spoken utterance by processing the concatenated output generated for each corresponding token in the sequence of tokens.

36.

发明授权
Multichannel speech recognition using neural networks 有权

公开(公告)号：US11062725B2

公开(公告)日：2021-07-13

申请号：US16278830

申请日：2019-02-19

Applicant: Google LLC

Inventor： Ehsan Variani , Kevin William Wilson , Ron J. Weiss , Tara N. Sainath , Arun Narayanan

IPC: G10L15/16 , G10L25/30 , G10L21/028 , G10L21/0388 , G10L19/008 , G10L15/20 , G10L21/0208 , G10L21/0216

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

37.

发明申请
Large-Scale Multilingual Speech Recognition With A Streaming End-To-End Model 审中-公开

公开(公告)号：US20200380215A1

公开(公告)日：2020-12-03

申请号：US16834342

申请日：2020-03-30

Applicant: Google LLC

Inventor： Anjuli Patricia Kannan , Tara N. Sainath , Yonghui Wu , Ankur Bapna , Arindrima Datta

IPC: G06F40/40 , G10L15/00

Abstract: A method of transcribing speech using a multilingual end-to-end (E2E) speech recognition model includes receiving audio data for an utterance spoken in a particular native language, obtaining a language vector identifying the particular language, and processing, using the multilingual E2E speech recognition model, the language vector and acoustic features derived from the audio data to generate a transcription for the utterance. The multilingual E2E speech recognition model includes a plurality of language-specific adaptor modules that include one or more adaptor modules specific to the particular native language and one or more other adaptor modules specific to at least one other native language different than the particular native language. The method also includes providing the transcription for output.

38.

发明授权
Processing audio waveforms 有权

公开(公告)号：US10403269B2

公开(公告)日：2019-09-03

申请号：US15080927

申请日：2016-03-25

Applicant: Google LLC

Inventor： Tara N. Sainath , Ron J. Weiss , Andrew W. Senior , Kevin William Wilson

IPC: G10L15/16 , G06N3/04 , G06N3/08 , G10L15/26 , G10L15/14

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.

39.

发明授权
Enhanced multi-channel acoustic models 有权

公开(公告)号：US10224058B2

公开(公告)日：2019-03-05

申请号：US15350293

申请日：2016-11-14

Applicant: Google LLC

Inventor： Ehsan Variani , Kevin William Wilson , Ron J. Weiss , Tara N. Sainath , Arun Narayanan

IPC: G10L15/16 , G10L25/30 , G10L21/028 , G10L21/0388 , G10L19/008 , G10L15/20 , G10L21/0208 , G10L21/0216

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

40.

发明申请
COMPLEX LINEAR PROJECTION FOR ACOUSTIC MODELING 审中-公开

公开(公告)号：US20180174575A1

公开(公告)日：2018-06-21

申请号：US15386979

申请日：2016-12-21

Applicant: Google LLC

Inventor： Samuel Bengio , Mirko Visontai , Christopher Walter George Thornton , Michiel A.U. Bacchiani , Tara N. Sainath , Ehsan Variani , Izhak Shafran

IPC: G10L15/16 , G10L19/02 , G10L15/02

CPC classification number: G10L15/16 , G10H1/00 , G10H2210/036 , G10H2210/046 , G10H2250/235 , G10H2250/311 , G10L15/02 , G10L17/18

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification