Patent search ap:("Google LLC") AND inv:"Trevor Strohman" Page 4

31.

发明申请
Transducer-Based Streaming Deliberation for Cascaded Encoders 有权

公开(公告)号：US20240428786A1

公开(公告)日：2024-12-26

申请号：US18826655

申请日：2024-09-06

Applicant: Google LLC

Inventor： Ke Hu , Tara N. Sainath , Arun Narayanan , Ruoming Pang , Trevor Strohman

IPC: G10L15/197 , G06F40/126 , G10L15/02 , G10L15/06 , G10L15/08 , G10L15/22

Abstract: A method includes receiving a sequence of acoustic frames and generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by a first pass transducer decoder, a first pass speech recognition hypothesis for a corresponding first higher order feature representation and generating, by a text encoder, a text encoding for a corresponding first pass speech recognition hypothesis. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a second pass transducer decoder, a second pass speech recognition hypothesis using a corresponding second higher order feature representation and a corresponding text encoding.

32.

发明公开
DIALOG MANAGEMENT FOR LARGE LANGUAGE MODEL-BASED (LLM-BASED) DIALOGS 审中-公开

公开(公告)号：US20240311575A1

公开(公告)日：2024-09-19

申请号：US18123141

申请日：2023-03-17

Applicant: GOOGLE LLC

Inventor： Martin Baeuml , Alexander Bailey , Jonas Bragagnolo , Florent D'Halluin , Trevor Strohman

IPC: G06F40/35 , G06N20/00

CPC classification number: G06F40/35 , G06N20/00

Abstract: Implementations relate to dialog management of a large language model (LLM) utilized in generating natural language (NL) output during an ongoing dialog. Processor(s) of a system can: receive NL based input as part of the ongoing dialog, generate NL based output utilizing the LLM, and cause the NL based output to be rendered. Further, the processor(s) can receive subsequent NL based input as part of the ongoing dialog. In some implementations, the processor(s) can determine whether to modify a corresponding dialog context in generating subsequent NL based output, and modify the corresponding dialog context accordingly. For example, the processor(s) can restrict the corresponding dialog context, or supplant the corresponding dialog context with a corresponding curated dialog context. In additional or alternative implementations, the processor(s) can modify a corresponding NL based output threshold utilized in generating the subsequent NL based response to ensure the resulting NL based output is desirable.

33.

发明公开
STREAMING OF NATURAL LANGUAGE (NL) BASED OUTPUT GENERATED USING A LARGE LANGUAGE MODEL (LLM) TO REDUCE LATENCY IN RENDERING THEREOF 审中-公开

公开(公告)号：US20240311402A1

公开(公告)日：2024-09-19

申请号：US18136634

申请日：2023-04-19

Applicant: GOOGLE LLC

Inventor： Martin Baeuml , Yanping Huang , Wenhao Jia , Chang Lan , Yuanzhong Xu , Junwhan Ahn , Alexander Bailey , Leif Schelin , Trevor Strohman , Emanuel Taropa , Sidharth Mudgal , Yanyan Zheng , Zhifeng Chen , Ahmad Beirami

IPC: G06F16/332 , G06F40/40

CPC classification number: G06F16/3322 , G06F16/3329 , G06F40/40

Abstract: Implementations relate to reducing latency in generating and/or rendering natural language (NL) output generated using a large language model (LLM). Processor(s) of a system can: receive NL based input associated with a client device, and generate the NL based output utilizing the LLM. The NL based output can be a stream of NL based output in that it includes a plurality of segments, and is generated on a segment-by-segment basis. In some implementations, a first segment of the stream of NL based output is selected for inclusion in the stream of NL based output as a second segment (and any subsequent segment) is being generated to reduce latency in evaluating the NL based output as a whole prior to rendering thereof. In some versions of those implementations, the first segment is rendered as the second segment (and any subsequent segment) is being generated to further reduce latency in rendering thereof.

34.

发明公开
Joint Speech and Text Streaming Model for ASR 审中-公开

公开(公告)号：US20240028829A1

公开(公告)日：2024-01-25

申请号：US18346232

申请日：2023-07-01

Applicant: Google LLC

Inventor： Tara N. Sainath , Zhouyuan Huo , Zhehuai Chen , Yu Zhang , Weiran Wang , Trevor Strohman , Rohit Prakash Prabhavalkar , Bo Li , Ankur Bapna

IPC: G06F40/284 , G06F40/40

CPC classification number: G06F40/284 , G06F40/40

Abstract: A method includes receiving training data that includes a set of unspoken textual utterances. For each respective unspoken textual utterance, the method includes, tokenizing the respective textual utterance into a sequence of sub-word units, generating a first higher order textual feature representation for a corresponding sub-word unit tokenized from the respective unspoken textual utterance, receiving the first higher order textual feature representation generated by a text encoder, and generating a first probability distribution over possible text units. The method also includes training an encoder based on the first probability distribution over possible text units generated by a first-pass decoder for each respective unspoken textual utterance in the set of unspoken textual utterances.

35.

发明公开
METHODS AND SYSTEMS FOR REDUCING LATENCY IN AUTOMATED ASSISTANT INTERACTIONS 审中-公开

公开(公告)号：US20230410803A1

公开(公告)日：2023-12-21

申请号：US18228948

申请日：2023-08-01

Applicant: GOOGLE LLC

Inventor： Lior Alon , Rafael Goldfarb , Dekel Auster , Dan Rasin , Michael Andrew Goodman , Trevor Strohman , Nino Tasca , Valerie Nygaard , Jaclyn Konzelmann

IPC: G10L15/22 , G06F3/16 , G10L15/08 , G10L15/18 , G10L15/28

CPC classification number: G10L15/22 , G06F3/167 , G10L2015/223 , G10L15/1815 , G10L15/285 , G10L15/083

Abstract: Implementations described herein relate to reducing latency in automated assistant interactions. In some implementations, a client device can receive audio data that captures a spoken utterance of a user. The audio data can be processed to determine an assistant command to be performed by an automated assistant. The assistant command can be processed, using a latency prediction model, to generate a predicted latency to fulfill the assistant command. Further, the client device (or the automated assistant) can determine, based on the predicted latency, whether to audibly render pre-cached content for presentation to the user prior to audibly rendering content that is responsive to the spoken utterance. The pre-cached content can be tailored to the assistant command and audibly rendered for presentation to the user while the content is being obtained, and the content can be audibly rendered for presentation to the user subsequent to the pre-cached content.

36.

发明公开
Rare Word Recognition with LM-aware MWER Training 审中-公开

公开(公告)号：US20230298570A1

公开(公告)日：2023-09-21

申请号：US18187222

申请日：2023-03-21

Applicant: Google LLC

Inventor： Weiran Wang , Tongzhou Chen , Tara N. Sainath , Ehsan Variani , Rohit Prakash Prabhavalkar , Ronny Huang , Bhuvana Ramabhadran , Neeraj Gaur , Sepand Mavandadi , Charles Caleb Peyser , Trevor Strohman , Yangzhang He , David Rybach

IPC: G10L15/06 , G10L15/19 , G10L15/22 , G10L15/16 , G10L15/02

CPC classification number: G10L15/063 , G10L15/19 , G10L15/22 , G10L15/16 , G10L15/02

Abstract: A method includes generating, using an audio encoder, a higher-order feature representation for each acoustic frame in a sequence of acoustic frames; generating, using a decoder, based on the higher-order feature representation, a plurality of speech recognition hypotheses, each hypotheses corresponding to a candidate transcription of an utterance and having an associated first likelihood score; generating, using an external language model, for each speech recognition hypothesis, a second likelihood score; determining, using a learnable fusion module, for each speech recognition hypothesis, a set of fusion weights based on the higher-order feature representation and the speech recognition hypothesis; and generating, using the learnable fusion module, for each speech recognition hypothesis, a third likelihood score based on the first likelihood score, the second likelihood score, and the set of fusion weights, the audio encoder and decoder trained using minimum additive error rate training in the presence of the external language model.

37.

发明公开
EPHEMERAL LEARNING OF MACHINE LEARNING MODEL(S) 审中-公开

公开(公告)号：US20230156248A1

公开(公告)日：2023-05-18

申请号：US17533779

申请日：2021-11-23

Applicant: GOOGLE LLC

Inventor： Françoise Beaufays , Khe Chai Sim , Trevor Strohman , Oren Litvin

IPC: H04N21/233 , G06N20/00 , G06K9/62 , H04N21/232

CPC classification number: H04N21/233 , G06N20/00 , G06K9/6256 , H04N21/232

Abstract: Implementations disclosed herein are directed to ephemeral learning of machine learning (“ML”) model(s) based on gradient(s) generated at a remote system (e.g., remote server(s)). Processor(s) of the remote system can receive stream(s) of audio data capturing spoken utterance(s) from a client device of a user. A fulfillment pipeline can process the stream(s) of audio data to cause certain fulfillment(s) of the spoken utterance(s) to be performed. Meanwhile, a training pipeline can process the stream(s) of audio data to generate gradient(s) using unsupervised learning techniques. Subsequent to the processing by the fulfillment pipeline and/or the training pipeline, the stream(s) of audio data are discarded by the remote system. Accordingly, the ML model(s) can be trained at the remote system without storing or logging of the stream(s) of audio data by non-transient memory thereof, thereby providing more efficient training mechanisms for training the ML model(s) and also increasing security of user data.

38.

发明申请
TWO-PASS END TO END SPEECH RECOGNITION 有权

公开(公告)号：US20220310072A1

公开(公告)日：2022-09-29

申请号：US17616129

申请日：2020-06-03

Applicant: GOOGLE LLC

Inventor： Tara N. Sainath , Ruoming Pang , David Rybach , Yanzhang He , Rohit Prabhavalkar , Wei Li , Mirkó Visontai , Qiao Liang , Trevor Strohman , Yonghui Wu , Ian C. McGraw , Chung-Cheng Chiu

IPC: G10L15/16 , G10L15/32 , G10L15/05

Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

39.

发明授权
Optimizing inference performance for conformer 有权

公开(公告)号：US12190869B2

公开(公告)日：2025-01-07

申请号：US17936547

申请日：2022-09-29

Applicant: Google LLC

Inventor： Tara N. Sainath , Rami Botros , Anmol Gulati , Krzysztof Choromanski , Ruoming Pang , Trevor Strohman , Weiran Wang , Jiahui Yu

IPC: G10L15/16 , G10L15/06 , G10L15/22

Abstract: A computer-implemented method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. Here, the ASR model includes a causal encoder and a decoder. The method also includes generating, by the causal encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by the decoder, a first probability distribution over possible speech recognition hypotheses. Here, the causal encoder includes a stack of causal encoder layers each including a Recurrent Neural Network (RNN) Attention-Performer module that applies linear attention.

40.

发明申请
EFFICIENT STREAMING NON-RECURRENT ON-DEVICE END-TO-END MODEL 有权

公开(公告)号：US20240371363A1

公开(公告)日：2024-11-07

申请号：US18772263

申请日：2024-07-15

Applicant: Google LLC

Inventor： Tara Sainath , Arun Narayanan , Rami Botros , Yanzhang He , Ehsan Variani , Cyril Allauzen , David Rybach , Ruoming Pang , Trevor Strohman

IPC: G10L15/06 , G10L15/02 , G10L15/22 , G10L15/30

Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification