Patent search ap:("GOOGLE LLC") AND inv:"Trevor Strohman" Page 5

41.

发明授权
Ephemeral learning of machine learning model(s) 有权

公开(公告)号：US12126845B2

公开(公告)日：2024-10-22

申请号：US17533779

申请日：2021-11-23

Applicant: GOOGLE LLC

Inventor： Françoise Beaufays , Khe Chai Sim , Trevor Strohman , Oren Litvin

IPC: H04N21/233 , G06F18/214 , G06N20/00 , H04N21/232

CPC classification number: H04N21/233 , G06F18/214 , G06N20/00 , H04N21/232

Abstract: Implementations disclosed herein are directed to ephemeral learning of machine learning (“ML”) model(s) based on gradient(s) generated at a remote system (e.g., remote server(s)). Processor(s) of the remote system can receive stream(s) of audio data capturing spoken utterance(s) from a client device of a user. A fulfillment pipeline can process the stream(s) of audio data to cause certain fulfillment(s) of the spoken utterance(s) to be performed. Meanwhile, a training pipeline can process the stream(s) of audio data to generate gradient(s) using unsupervised learning techniques. Subsequent to the processing by the fulfillment pipeline and/or the training pipeline, the stream(s) of audio data are discarded by the remote system. Accordingly, the ML model(s) can be trained at the remote system without storing or logging of the stream(s) of audio data by non-transient memory thereof, thereby providing more efficient training mechanisms for training the ML model(s) and also increasing security of user data.

42.

发明公开
BLOCKWISE CONTROLLED DECODING OF NATURAL LANGUAGE (NL) BASED OUTPUT GENERATED USING A LARGE LANGUAGE MODEL (LLM) TO REDUCE LATENCY IN RENDERING THEREOF 审中-公开

公开(公告)号：US20240330334A1

公开(公告)日：2024-10-03

申请号：US18225990

申请日：2023-07-25

Applicant: GOOGLE LLC

Inventor： Sidharth Mudgal , Ahmad Beirami , Jilin Chen , Alex Beutel , Harish Ganapathy , YaGuang Li , Tao Wang , Yanping Huang , Trevor Strohman

IPC: G06F16/332 , G06F40/284

CPC classification number: G06F16/3329 , G06F40/284

Abstract: Implementations relate to reducing latency in generating and/or rendering a given stream of natural language (NL) based output generated using a large language model (LLM). Processor(s) of a system can: receive NL based input associated with a client device, generate the stream of NL based output utilizing the LLM that is responsive to the NL based input and that is for a given dialog context of an ongoing dialog, and cause the stream of NL based output to be rendered at the client device. Notably, the processor(s) can employ attribute classifier(s) and a multi-objective scorer to implement a blockwise controlled decoding technique in generating the stream of NL based output utilizing the LLM. By implementing the blockwise controlled decoding technique in generating the stream of NL based output utilizing the LLM, the processor(s) can reduce latency in generating and/or of the stream of NL based output generated utilizing the LLM.

43.

发明公开
Emitting Word Timings with End-to-End Models 审中-公开

公开(公告)号：US20240321263A1

公开(公告)日：2024-09-26

申请号：US18680797

申请日：2024-05-31

Applicant: Google LLC

Inventor： Tara N. Sainath , Basilio Garcia Castillo , David Rybach , Trevor Strohman , Ruoming Pang

IPC: G10L15/06 , G10L25/30 , G10L25/78

CPC classification number: G10L15/063 , G10L25/30 , G10L25/78

Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

44.

发明公开
FACTUALITY OF GENERATED RESPONSES 审中-公开

公开(公告)号：US20240289395A1

公开(公告)日：2024-08-29

申请号：US18528142

申请日：2023-12-04

Applicant: Google LLC

Inventor： Hao Zhou , Shrestha Basu Mallick , Trevor Strohman , Patricia Luisa Romero Domingo , Amirhossein Kiani , Yu Du , Xinying Song , Heng-Tze Cheng , Quoc V. Le , Ed Huai-Hsin Chi , Christopher Jamie Maclean Hall

IPC: G06F16/9532 , G06F16/955 , G06F40/40

CPC classification number: G06F16/9532 , G06F16/955 , G06F40/40

Abstract: Implementations relate to helping a large language model generate factual responses to prompts that request factual content is disclosed. The large language model may receive a prompt context, a plurality of encoded context passages as input. The large language model is trained to determine whether or not to utilize the encoded context passages in generating the response. Implementations also relate to different methods of fine-tuning the responses generated by the large language model through query refinements, response re-writes, and evaluation of factual accuracy.

45.

发明授权
Efficient streaming non-recurrent on-device end-to-end model 有权

公开(公告)号：US12051404B2

公开(公告)日：2024-07-30

申请号：US18336211

申请日：2023-06-16

Applicant: Google LLC

Inventor： Tara Sainath , Arun Narayanan , Rami Botros , Yanzhang He , Ehsan Variani , Cyril Allauzen , David Rybach , Ruoming Pang , Trevor Strohman

IPC: G10L15/00 , G10L15/02 , G10L15/06 , G10L15/22 , G10L15/30

CPC classification number: G10L15/063 , G10L15/02 , G10L15/22 , G10L15/30

Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

46.

发明授权
Enabling natural conversations with soft endpointing for an automated assistant 有权

公开(公告)号：US12020703B2

公开(公告)日：2024-06-25

申请号：US17532819

申请日：2021-11-22

Applicant: GOOGLE LLC

Inventor： Jaclyn Konzelmann , Trevor Strohman , Jonathan Bloom , Johan Schalkwyk , Joseph Smarr

IPC: G10L15/22 , G06N20/00 , G08B5/36 , G10L15/08 , G10L15/18

CPC classification number: G10L15/22 , G06N20/00 , G08B5/36 , G10L15/18 , G10L2015/088 , G10L2015/223

Abstract: As part of a dialog session between a user and an automated assistant, implementations can process, using a streaming ASR model, a stream of audio data that captures a portion of a spoken utterance to generate ASR output, process, using an NLU model, the ASR output to generate NLU output, and cause, based on the NLU output, a stream of fulfillment data to be generated. Further, implementations can further determine, based on processing the stream of audio data, audio-based characteristics associated with the portion of the spoken utterance captured in the stream of audio data. Based on the audio-based characteristics and/the stream of NLU output, implementations can determine whether the user has paused in providing the spoken utterance or has completed providing of the spoken utterance. If the user has paused, implementations can cause natural conversation output to be provided for presentation to the user.

47.

发明公开
Streaming End-to-end Multilingual Speech Recognition with Joint Language Identification 审中-公开

公开(公告)号：US20230306958A1

公开(公告)日：2023-09-28

申请号：US18188632

申请日：2023-03-23

Applicant: Google LLC

Inventor： Chao Zhang , Bo Li , Tara N. Sainath , Trevor Strohman , Sepand Mavandadi , Shuo-yiin Chang , Parisa Haghani

IPC: G10L15/00 , G10L15/16 , G10L15/06

CPC classification number: G10L15/005 , G10L15/16 , G10L15/063

Abstract: A method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. The method also includes generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a language identification (ID) predictor, a language prediction representation based on a concatenation of the first higher order feature representation and the second higher order feature representation. The method also includes generating, by a first decoder, a first probability distribution over possible speech recognition hypotheses based on a concatenation of the second higher order feature representation and the language prediction representation.

48.

发明授权
Methods and systems for reducing latency in automated assistant interactions 有权

公开(公告)号：US11763813B2

公开(公告)日：2023-09-19

申请号：US17243232

申请日：2021-04-28

Applicant: Google LLC

Inventor： Lior Alon , Rafael Goldfarb , Dekel Auster , Dan Rasin , Michael Andrew Goodman , Trevor Strohman , Nino Tasca , Valerie Nygaard , Jaclyn Konzelmann

IPC: G10L15/22 , G06F3/16 , G10L15/08 , G10L15/18 , G10L15/28

CPC classification number: G10L15/22 , G06F3/167 , G10L15/083 , G10L15/1815 , G10L15/285 , G10L2015/223

Abstract: Implementations described herein relate to reducing latency in automated assistant interactions. In some implementations, a client device can receive audio data that captures a spoken utterance of a user. The audio data can be processed to determine an assistant command to be performed by an automated assistant. The assistant command can be processed, using a latency prediction model, to generate a predicted latency to fulfill the assistant command. Further, the client device (or the automated assistant) can determine, based on the predicted latency, whether to audibly render pre-cached content for presentation to the user prior to audibly rendering content that is responsive to the spoken utterance. The pre-cached content can be tailored to the assistant command and audibly rendered for presentation to the user while the content is being obtained, and the content can be audibly rendered for presentation to the user subsequent to the pre-cached content.

49.

发明公开
Emitting Word Timings with End-to-End Models 审中-公开

公开(公告)号：US20230206907A1

公开(公告)日：2023-06-29

申请号：US18167050

申请日：2023-02-09

Applicant: Google LLC

Inventor： Tara N Sainath , Basilio Garcia Castillo , David Rybach , Trevor Strohman , Ruoming Pang

IPC: G10L15/06 , G10L25/30 , G10L25/78

CPC classification number: G10L15/063 , G10L25/30 , G10L25/78

Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

50.

发明申请
Lookup-Table Recurrent Language Model 有权

公开(公告)号：US20220310067A1

公开(公告)日：2022-09-29

申请号：US17650566

申请日：2022-02-10

Applicant: Google LLC

Inventor： Ronny Huang , Tara N. Sainath , Trevor Strohman , Shankar Kumar

IPC: G10L15/08 , G10L15/26 , G10L15/187 , G06N3/04 , G10L15/16

Abstract: A computer-implemented method includes receiving audio data that corresponds to an utterance spoken by a user and captured by a user device. The method also includes processing the audio data to determine a candidate transcription that includes a sequence of tokens for the spoken utterance. Tor each token in the sequence of tokens, the method includes determining a token embedding for corresponding token, determining a n-gram token embedding for a previous sequence of n-gram tokens, and concatenating the token embedding and the n-gram token embedding to generate a concatenated output for the corresponding token. The method also includes rescoring the candidate transcription for the spoken utterance by processing the concatenated output generated for each corresponding token in the sequence of tokens.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification