Patent search ap:("Google LLC") AND inv:"Tara N. Sainath" Page 8

71.

发明公开
Joint Segmenting and Automatic Speech Recognition 审中-公开

公开(公告)号：US20230343332A1

公开(公告)日：2023-10-26

申请号：US18304064

申请日：2023-04-20

Applicant: Google LLC

Inventor： Ronny Huang , Shuo-yiin Chang , David Rybach , Rohit Prakash Prabhavalkar , Tara N. Sainath , Cyril Allauzen , Charles Caleb Peyser , Zhiyun Lu

IPC: G10L15/04 , G10L25/93 , G10L15/197 , G10L15/06 , G10L15/22 , G10L15/02

CPC classification number: G10L15/197 , G10L15/02 , G10L15/04 , G10L15/063 , G10L15/22 , G10L25/93 , G10L2015/025 , G10L2025/932

Abstract: A joint segmenting and ASR model includes an encoder and decoder. The encoder configured to: receive a sequence of acoustic frames characterizing one or more utterances; and generate, at each output step, a higher order feature representation for a corresponding acoustic frame. The decoder configured to: receive the higher order feature representation and generate, at each output step: a probability distribution over possible speech recognition hypotheses, and an indication of whether the corresponding output step corresponds to an end of speech segment. The j oint segmenting and ASR model trained on a set of training samples, each training sample including: audio data characterizing a spoken utterance; and a corresponding transcription of the spoken utterance, the corresponding transcription having an end of speech segment ground truth token inserted into the corresponding transcription automatically based on a set of heuristic-based rules and exceptions applied to the training sample.

72.

发明公开
Intended Query Detection using E2E Modeling for continued Conversation 审中-公开

公开(公告)号：US20230335117A1

公开(公告)日：2023-10-19

申请号：US18186872

申请日：2023-03-20

Applicant: Google LLC

Inventor： Shuo-yiin Chang , Guru Prakash Arumugam , Zelin Wu , Tara N. Sainath , Bo LI , Qiao Liang , Adam Stambler , Shyam Upadhyay , Manaal Faruqui , Trevor Strohman

IPC: G10L15/16 , G10L15/22 , G10L15/06

CPC classification number: G10L15/16 , G10L15/22 , G10L15/063 , G10L2015/223

Abstract: A method includes receiving, as input to a speech recognition model, audio data corresponding to a spoken utterance. The method also includes performing, using the speech recognition model, speech recognition on the audio data by, at each of a plurality of time steps, encoding, using an audio encoder, the audio data corresponding to the spoken utterance into a corresponding audio encoding, and decoding, using a speech recognition joint network, the corresponding audio encoding into a probability distribution over possible output labels. At each of the plurality of time steps, the method also includes determining, using an intended query (IQ) joint network configured to receive a label history representation associated with a sequence of non-blank symbols output by a final softmax layer, an intended query decision indicating whether or not the spoken utterance includes a query intended for a digital assistant.

73.

发明授权
Proper noun recognition in end-to-end speech recognition 有权

公开(公告)号：US11749259B2

公开(公告)日：2023-09-05

申请号：US17150491

申请日：2021-01-15

Applicant: Google LLC

Inventor： Charles Caleb Peyser , Tara N. Sainath , Golan Pundak

IPC: G10L15/06 , G10L15/16 , G10L15/18 , G10L15/187 , G06N3/049

CPC classification number: G10L15/063 , G06N3/049 , G10L15/16 , G10L15/187 , G10L15/1815

Abstract: A method for training a speech recognition model with a minimum word error rate loss function includes receiving a training example comprising a proper noun and generating a plurality of hypotheses corresponding to the training example. Each hypothesis of the plurality of hypotheses represents the proper noun and includes a corresponding probability that indicates a likelihood that the hypothesis represents the proper noun. The method also includes determining that the corresponding probability associated with one of the plurality of hypotheses satisfies a penalty criteria. The penalty criteria indicating that the corresponding probability satisfies a probability threshold, and the associated hypothesis incorrectly represents the proper noun. The method also includes applying a penalty to the minimum word error rate loss function.

74.

发明授权
Attention-based joint acoustic and text on-device end-to-end model 有权

公开(公告)号：US11594212B2

公开(公告)日：2023-02-28

申请号：US17155010

申请日：2021-01-21

Applicant: Google LLC

Inventor： Tara N. Sainath , Ruoming Pang , Ron Weiss , Yanzhang He , Chung-Cheng Chiu , Trevor Strohman

IPC: G10L15/06 , G06N3/08 , G10L15/16 , G10L15/197

Abstract: A method includes receiving a training example for a listen-attend-spell (LAS) decoder of a two-pass streaming neural network model and determining whether the training example corresponds to a supervised audio-text pair or an unpaired text sequence. When the training example corresponds to an unpaired text sequence, the method also includes determining a cross entropy loss based on a log probability associated with a context vector of the training example. The method also includes updating the LAS decoder and the context vector based on the determined cross entropy loss.

75.

发明授权
Using context information with end-to-end models for speech recognition 有权

公开(公告)号：US11545142B2

公开(公告)日：2023-01-03

申请号：US16827937

申请日：2020-03-24

Applicant: Google LLC

Inventor： Ding Zhao , Bo Li , Ruoming Pang , Tara N. Sainath , David Rybach , Deepti Bhatia , Zelin Wu

IPC: G10L15/183 , G10L15/16 , G06N20/00 , G06K9/62 , G06N3/08

Abstract: A method includes receiving audio data encoding an utterance, processing, using a speech recognition model, the audio data to generate speech recognition scores for speech elements, and determining context scores for the speech elements based on context data indicating a context for the utterance. The method also includes executing, using the speech recognition scores and the context scores, a beam search decoding process to determine one or more candidate transcriptions for the utterance. The method also includes selecting a transcription for the utterance from the one or more candidate transcriptions.

76.

发明授权
End-to-end automated speech recognition on numeric sequences 有权

公开(公告)号：US11367432B2

公开(公告)日：2022-06-21

申请号：US16830996

申请日：2020-03-26

Applicant: Google LLC

Inventor： Charles Caleb Peyser , Hao Zhang , Tara N. Sainath , Zelin Wu

IPC: G10L15/22 , G10L15/26 , G10L15/06 , G06N3/08 , G10L13/00 , G10L15/16 , G10L15/197

Abstract: A method for generating final transcriptions representing numerical sequences of utterances in a written domain includes receiving audio data for an utterance containing a numeric sequence, and decoding, using a sequence-to-sequence speech recognition model, the audio data for the utterance to generate, as output from the sequence-to-sequence speech recognition model, an intermediate transcription of the utterance. The method also includes processing, using a neural corrector/denormer, the intermediate transcription to generate a final transcription that represents the numeric sequence of the utterance in a written domain. The neural corrector/denormer is trained on a set of training samples, where each training sample includes a speech recognition hypothesis for a training utterance and a ground-truth transcription of the training utterance. The ground-truth transcription of the training utterance is in the written domain. The method also includes providing the final transcription representing the numeric sequence of the utterance in the written domain for output.

77.

发明申请
PHONEME-BASED CONTEXTUALIZATION FOR CROSS-LINGUAL SPEECH RECOGNITION IN END-TO-END MODELS 有权

公开(公告)号：US20220172706A1

公开(公告)日：2022-06-02

申请号：US17651315

申请日：2022-02-16

Applicant: Google LLC

Inventor： Ke Hu , Golan Pundak , Rohit Prakash Prabhavalkar , Antoine Jean Bruguier , Tara N. Sainath

IPC: G10L15/06 , G10L15/02 , G10L15/187 , G10L15/193 , G10L15/28 , G10L15/32 , G10L25/30

Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.

78.

发明申请
MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION 有权

公开(公告)号：US20220130374A1

公开(公告)日：2022-04-28

申请号：US17572238

申请日：2022-01-10

Applicant: Google LLC

Inventor： Zhifeng Chen , Bo Li , Eugene Weinstein , Yonghui Wu , Pedro J. Moreno Mengibar , Ron J. Weiss , Khe Chai Sim , Tara N. Sainath , Patrick An Phu Nguyen

IPC: G10L15/00 , G10L15/16 , G10L15/07

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.

79.

发明授权
Adaptive audio enhancement for multichannel speech recognition 有权

公开(公告)号：US11257485B2

公开(公告)日：2022-02-22

申请号：US16708930

申请日：2019-12-10

Applicant: Google LLC

Inventor： Bo Li , Ron J. Weiss , Michiel A. U. Bacchiani , Tara N. Sainath , Kevin William Wilson

IPC: G10L15/00 , G10L15/16 , G10L15/20 , G10L21/0224 , G10L15/26 , G10L21/0216

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

80.

发明申请
Emitting Word Timings with End-to-End Models 有权

公开(公告)号：US20210350794A1

公开(公告)日：2021-11-11

申请号：US17204852

申请日：2021-03-17

Applicant: Google LLC

Inventor： Tara N. Sainath , Basi Garcia , David Rybach , Trevor Strohman , Ruoming Pang

IPC: G10L15/06 , G10L25/30 , G10L25/78

Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification