Patent search ap:("Google LLC") AND inv:"Tara N. Sainath" Page 2

11.

发明公开
Deliberation by Text-Only and Semi-Supervised Training 审中-公开

公开(公告)号：US20230298563A1

公开(公告)日：2023-09-21

申请号：US18186157

申请日：2023-03-18

Applicant: Google LLC

Inventor： Ke Hu , Tara N. Sainath , Yanzhang He , Rohit Prabhavalkar , Sepand Mavandadi , Weiran Wang , Trevor Strohman

IPC: G10L13/08 , G10L15/16 , G10L15/06

CPC classification number: G10L13/08 , G10L15/16 , G10L15/063

Abstract: A method of text-only and semi-supervised training for deliberation includes receiving training data including unspoken textual utterances that are each not paired with any corresponding spoken utterance of non-synthetic speech, and training a deliberation model that includes a text encoder and a deliberation decoder on the unspoken textual utterances. The method also includes receiving, at the trained deliberation model, first-pass hypotheses and non-causal acoustic embeddings. The first-pass hypotheses is generated by a recurrent neural network-transducer (RNN-T) decoder for the non-causal acoustic embeddings encoded by a non-causal encoder. The method also includes encoding, using the text encoder, the first-pass hypotheses generated by the RNN-T decoder, and generating, using the deliberation decoder attending to both the first-pass hypotheses and the non-causal acoustic embeddings, second-pass hypotheses.

12.

发明申请
CONTEXTUAL BIASING FOR SPEECH RECOGNITION 有权

公开(公告)号：US20220366897A1

公开(公告)日：2022-11-17

申请号：US17815049

申请日：2022-07-26

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Golan Pundak , Tara N. Sainath

IPC: G10L15/16 , G10L15/26

Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.

13.

发明授权
Large-scale multilingual speech recognition with a streaming end-to-end model 有权

公开(公告)号：US11468244B2

公开(公告)日：2022-10-11

申请号：US16834342

申请日：2020-03-30

Applicant: Google LLC

Inventor： Anjuli Patricia Kannan , Tara N. Sainath , Yonghui Wu , Ankur Bapna , Arindrima Datta

IPC: G10L15/00 , G06F40/40

Abstract: A method of transcribing speech using a multilingual end-to-end (E2E) speech recognition model includes receiving audio data for an utterance spoken in a particular native language, obtaining a language vector identifying the particular language, and processing, using the multilingual E2E speech recognition model, the language vector and acoustic features derived from the audio data to generate a transcription for the utterance. The multilingual E2E speech recognition model includes a plurality of language-specific adaptor modules that include one or more adaptor modules specific to the particular native language and one or more other adaptor modules specific to at least one other native language different than the particular native language. The method also includes providing the transcription for output.

14.

发明申请
ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION 有权

公开(公告)号：US20220148582A1

公开(公告)日：2022-05-12

申请号：US17649058

申请日：2022-01-26

Applicant: Google LLC

Inventor： Bo Li , Ron Weiss , Michiel A.U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC: G10L15/16 , G10L15/20 , G10L21/0224

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

15.

发明授权
Phoneme-based contextualization for cross-lingual speech recognition in end-to-end models 有权

公开(公告)号：US11270687B2

公开(公告)日：2022-03-08

申请号：US16861190

申请日：2020-04-28

Applicant: Google LLC

Inventor： Ke Hu , Antoine Jean Bruguier , Tara N. Sainath , Rohit Prakash Prabhavalkar , Golan Pundak

IPC: G10L15/30 , G10L15/06 , G10L15/02 , G10L15/187 , G10L15/193 , G10L15/28 , G10L15/32 , G10L25/30

Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.

16.

发明申请
SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS 有权

公开(公告)号：US20220005465A1

公开(公告)日：2022-01-06

申请号：US17448119

申请日：2021-09-20

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A.u. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen

IPC: G10L15/16 , G10L15/22 , G10L15/02 , G06N3/08 , G10L15/06 , G10L25/30

Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

17.

发明申请
COMPLEX LINEAR PROJECTION FOR ACOUSTIC MODELING 审中-公开

公开(公告)号：US20200286468A1

公开(公告)日：2020-09-10

申请号：US16879322

申请日：2020-05-20

Applicant: Google LLC

Inventor： Samuel Bengio , Mirko Visontai , Christopher Walter George Thornton , Tara N. Sainath , Ehsan Variani , Izhak Shafran , Michiel A.u. Bacchiani

IPC: G10L15/16 , G10L15/02 , G10H1/00 , G10L19/02

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.

18.

发明申请
CONVOLUTIONAL NEURAL NETWORKS 审中-公开

公开(公告)号：US20200051551A1

公开(公告)日：2020-02-13

申请号：US16654041

申请日：2019-10-16

Applicant: Google LLC

Inventor： Tara N. Sainath , Maria Carolina Parada San Martin

IPC: G10L15/16 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for keyword spotting. One of the methods includes training, by a keyword detection system, a convolutional neural network for keyword detection by providing a two-dimensional set of input values to the convolutional neural network, the input values including a first dimension in time and a second dimension in frequency, and performing convolutional multiplication on the two-dimensional set of input values for a filter using a frequency stride greater than one to generate a feature map.

19.

发明授权
Adaptive audio enhancement for multichannel speech recognition 有权

公开(公告)号：US10515626B2

公开(公告)日：2019-12-24

申请号：US15848829

申请日：2017-12-20

Applicant: Google LLC

Inventor： Bo Li , Ron J. Weiss , Michiel A. U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC: G10L15/00 , G10L15/16 , G10L21/0224 , G10L15/20 , G10L15/26 , G10L21/0216

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

20.

发明申请
COMPLEX LINEAR PROJECTION FOR ACOUSTIC MODELING 审中-公开

公开(公告)号：US20190115013A1

公开(公告)日：2019-04-18

申请号：US16171629

申请日：2018-10-26

Applicant: Google LLC

Inventor： Samuel Bengio , Mirko Visontai , Christopher Walter George Thornton , Michiel A.U. Bacchiani , Tara N. Sainath , Ehsan Variani , Izhak Shafran

IPC: G10L15/16 , G10L19/02 , G10L15/02 , G10H1/00

CPC classification number: G10L15/16 , G10H1/00 , G10H2210/036 , G10H2210/046 , G10H2250/235 , G10H2250/311 , G10L15/02 , G10L17/18 , G10L19/0212

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification