-
21.
公开(公告)号:US20240311405A1
公开(公告)日:2024-09-19
申请号:US18337316
申请日:2023-06-19
Applicant: GOOGLE LLC
Inventor: Seungyeon Kim , Ankit Singh Rawat , Wittawat Jitkrittum , Hari Narasimhan , Sashank Reddi , Neha Gupta , Srinadh Bhojanapalli , Aditya Menon , Manzil Zaheer , Tal Schuster , Sanjiv Kumar , Toby Boyd , Zhifeng Chen , Emanuel Taropa , Vikram Kasivajhula , Trevor Strohman , Martin Baeuml , Leif Schelin , Yanping Huang
IPC: G06F16/332
CPC classification number: G06F16/3329
Abstract: Implementations disclose selecting, in response to receiving a request and from among multiple candidate generative models (e.g., multiple candidate large language models (LLMs)) with differing computational efficiencies, a particular generative model to utilize in generating a response to the request. Those implementations reduce latency and/or conserve computational resource(s) through selection, for various requests, of a more computationally efficient generative model for utilization in lieu of a less computationally efficient generative model. Further, those implementations seek to achieve such benefits, through utilization of more computationally efficient generative models, while also still selectively utilizing less computationally efficient generative models for certain requests to mitigate occurrences of a generated response being inaccurate and/or under-specified. This, in turn, can mitigate occurrences of computational and/or network inefficiencies that result from a user issuing a follow-up request to cure the inaccuracies and/or under-specification of a generated response.
-
公开(公告)号:US20240185841A1
公开(公告)日:2024-06-06
申请号:US18490808
申请日:2023-10-20
Applicant: Google LLC
Inventor: Bo Li , Yu Zhang , Nanxin Chen , Rohit Prakash Prabhavalkar , Chao-Han Huck Yang , Tara N. Sainath , Trevor Strohman
IPC: G10L15/065 , G10L15/00
CPC classification number: G10L15/065 , G10L15/005
Abstract: A method includes obtaining an ASR model trained to recognize speech in a first language and receiving transcribed training utterances in a second language. The method also includes integrating the ASR model with an input reprogramming module and a latent reprogramming module. The method also includes adapting the ASR model to learn how to recognize speech in the second language by training the input reprogramming module and the latent reprogramming module while parameters of the ASR model are frozen.
-
公开(公告)号:US20240029718A1
公开(公告)日:2024-01-25
申请号:US18352211
申请日:2023-07-13
Applicant: Google LLC
Inventor: Antoine Jean Bruguier , David Qiu , Yangzhang He , Trevor Strohman
Abstract: A method includes processing, using a speech recognizer, a first portion of audio data to generate a first lattice, and generating a first partial transcription for an utterance based on the first lattice. The method includes processing, using the recognizer, a second portion of the data to generate, based on the first lattice, a second lattice representing a plurality of partial speech recognition hypotheses for the utterance and a plurality of corresponding speech recognition scores. For each particular partial speech recognition hypothesis, the method includes generating a corresponding re-ranked score based on the corresponding speech recognition score and whether the particular partial speech recognition hypothesis shares a prefix with the first partial transcription. The method includes generating a second partial transcription for the utterance by selecting the partial speech recognition hypothesis of the second plurality of partial speech recognition hypotheses having the highest corresponding re-ranked score.
-
公开(公告)号:US20230186901A1
公开(公告)日:2023-06-15
申请号:US18167454
申请日:2023-02-10
Applicant: Google LLC
Inventor: Tara N. Sainath , Ruoming Pang , Ron Weiss , Yanzhang He , Chung-Cheng Chiu , Trevor Strohman
IPC: G10L15/06 , G06N3/08 , G10L15/16 , G10L15/197
CPC classification number: G10L15/063 , G06N3/08 , G10L15/16 , G10L15/197 , G10L2015/0635
Abstract: A method includes receiving a training example for a listen-attend-spell (LAS) decoder of a two-pass streaming neural network model and determining whether the training example corresponds to a supervised audio-text pair or an unpaired text sequence. When the training example corresponds to an unpaired text sequence, the method also includes determining a cross entropy loss based on a log probability associated with a context vector of the training example. The method also includes updating the LAS decoder and the context vector based on the determined cross entropy loss.
-
公开(公告)号:US20230130634A1
公开(公告)日:2023-04-27
申请号:US17936547
申请日:2022-09-29
Applicant: Google LLC
Inventor: Tara N. Sainath , Rami Botros , Anmol Gulati , Krzysztof Choromanski , Ruoming Pang , Trevor Strohman , Weiran Wang , Jiahui Yu
Abstract: A computer-implemented method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. Here, the ASR model includes a causal encoder and a decoder. The method also includes generating, by the causal encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by the decoder, a first probability distribution over possible speech recognition hypotheses. Here, the causal encoder includes a stack of causal encoder layers each including a Recurrent Neural Network (RNN) Attention-Performer module that applies linear attention.
-
公开(公告)号:US20230053341A1
公开(公告)日:2023-02-23
申请号:US17532819
申请日:2021-11-22
Applicant: GOOGLE LLC
Inventor: Jaclyn Konzelmann , Trevor Strohman , Jonathan Bloom , Johan Schalkwyk , Joseph Smarr
Abstract: As part of a dialog session between a user and an automated assistant, implementations can process, using a streaming ASR model, a stream of audio data that captures a portion of a spoken utterance to generate ASR output, process, using an NLU model, the ASR output to generate NLU output, and cause, based on the NLU output, a stream of fulfillment data to be generated. Further, implementations can further determine, based on processing the stream of audio data, audio-based characteristics associated with the portion of the spoken utterance captured in the stream of audio data. Based on the audio-based characteristics and/the stream of NLU output, implementations can determine whether the user has paused in providing the spoken utterance or has completed providing of the spoken utterance. If the user has paused, implementations can cause natural conversation output to be provided for presentation to the user.
-
公开(公告)号:US11580956B2
公开(公告)日:2023-02-14
申请号:US17204852
申请日:2021-03-17
Applicant: Google LLC
Inventor: Tara N. Sainath , Basi Garcia , David Rybach , Trevor Strohman , Ruoming Pang
Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.
-
公开(公告)号:US20210225362A1
公开(公告)日:2021-07-22
申请号:US17155010
申请日:2021-01-21
Applicant: Google LLC
Inventor: Tara N. Sainath , Ruorning Pang , Ron Weiss , Yanzhang He , Chung-Cheng Chiu , Trevor Strohman
IPC: G10L15/06 , G10L15/16 , G10L15/197 , G06N3/08
Abstract: A method includes receiving a training example for a listen-attend-spell (LAS) decoder of a two-pass streaming neural network model and determining whether the training example corresponds to a supervised audio-text pair or an unpaired text sequence. When the training example corresponds to an unpaired text sequence, the method also includes determining a cross entropy loss based on a log probability associated with a context vector of the training example. The method also includes updating the LAS decoder and the context vector based on the determined cross entropy loss.
-
公开(公告)号:US20250140239A1
公开(公告)日:2025-05-01
申请号:US19010299
申请日:2025-01-06
Applicant: Google LLC
Inventor: Shuo-yiin Chang , Bo Li , Tara N. Sainath , Trevor Strohman , Chao Zhang
Abstract: A method includes receiving a sequence of acoustic frames characterizing one or more utterances. At each of a plurality of output steps, the method also includes generating, by an encoder network of a speech recognition model, a higher order feature representation for a corresponding acoustic frame of the sequence of acoustic frames, generating, by a prediction network of the speech recognition model, a hidden representation for a corresponding sequence of non-blank symbols output by a final softmax layer of the speech recognition model, and generating, by a first joint network of the speech recognition model that receives the higher order feature representation generated by the encoder network and the dense representation generated by the prediction network, a probability distribution that the corresponding time step corresponds to a pause and an end of speech.
-
30.
公开(公告)号:US20250078830A1
公开(公告)日:2025-03-06
申请号:US18826743
申请日:2024-09-06
Applicant: Google LLC
Inventor: Junwen Bai , Bo Li , Qiujia Li , Tara N. Sainath , Trevor Strohman
IPC: G10L15/197 , G10L15/00 , G10L15/02 , G10L15/06 , G10L15/30
Abstract: A method includes receiving a sequence of acoustic frames characterizing a spoken utterance in a particular native language. The method also includes generating a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames by a causal encoder that includes an initial stack of multi-head attention layers. The method also includes generating a second higher order feature representation for a corresponding first higher order feature representation by a non-causal encoder that includes a final stack of multi-head attention layers. The method also includes receiving, as input at each corresponding language-dependent adapter (LDA) module, a language ID vector identifying the particular native language to activate corresponding language-dependent weights specific to the particular native language. The method also includes generating a first probability distribution over possible speech recognition hypotheses by a decoder.
-
-
-
-
-
-
-
-
-