Ephemeral learning of machine learning model(s)

    公开(公告)号:US12126845B2

    公开(公告)日:2024-10-22

    申请号:US17533779

    申请日:2021-11-23

    Applicant: GOOGLE LLC

    CPC classification number: H04N21/233 G06F18/214 G06N20/00 H04N21/232

    Abstract: Implementations disclosed herein are directed to ephemeral learning of machine learning (“ML”) model(s) based on gradient(s) generated at a remote system (e.g., remote server(s)). Processor(s) of the remote system can receive stream(s) of audio data capturing spoken utterance(s) from a client device of a user. A fulfillment pipeline can process the stream(s) of audio data to cause certain fulfillment(s) of the spoken utterance(s) to be performed. Meanwhile, a training pipeline can process the stream(s) of audio data to generate gradient(s) using unsupervised learning techniques. Subsequent to the processing by the fulfillment pipeline and/or the training pipeline, the stream(s) of audio data are discarded by the remote system. Accordingly, the ML model(s) can be trained at the remote system without storing or logging of the stream(s) of audio data by non-transient memory thereof, thereby providing more efficient training mechanisms for training the ML model(s) and also increasing security of user data.

    Emitting Word Timings with End-to-End Models
    43.
    发明公开

    公开(公告)号:US20240321263A1

    公开(公告)日:2024-09-26

    申请号:US18680797

    申请日:2024-05-31

    Applicant: Google LLC

    CPC classification number: G10L15/063 G10L25/30 G10L25/78

    Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

    Streaming End-to-end Multilingual Speech Recognition with Joint Language Identification

    公开(公告)号:US20230306958A1

    公开(公告)日:2023-09-28

    申请号:US18188632

    申请日:2023-03-23

    Applicant: Google LLC

    CPC classification number: G10L15/005 G10L15/16 G10L15/063

    Abstract: A method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. The method also includes generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a language identification (ID) predictor, a language prediction representation based on a concatenation of the first higher order feature representation and the second higher order feature representation. The method also includes generating, by a first decoder, a first probability distribution over possible speech recognition hypotheses based on a concatenation of the second higher order feature representation and the language prediction representation.

    Emitting Word Timings with End-to-End Models
    49.
    发明公开

    公开(公告)号:US20230206907A1

    公开(公告)日:2023-06-29

    申请号:US18167050

    申请日:2023-02-09

    Applicant: Google LLC

    CPC classification number: G10L15/063 G10L25/30 G10L25/78

    Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

    Lookup-Table Recurrent Language Model

    公开(公告)号:US20220310067A1

    公开(公告)日:2022-09-29

    申请号:US17650566

    申请日:2022-02-10

    Applicant: Google LLC

    Abstract: A computer-implemented method includes receiving audio data that corresponds to an utterance spoken by a user and captured by a user device. The method also includes processing the audio data to determine a candidate transcription that includes a sequence of tokens for the spoken utterance. Tor each token in the sequence of tokens, the method includes determining a token embedding for corresponding token, determining a n-gram token embedding for a previous sequence of n-gram tokens, and concatenating the token embedding and the n-gram token embedding to generate a concatenated output for the corresponding token. The method also includes rescoring the candidate transcription for the spoken utterance by processing the concatenated output generated for each corresponding token in the sequence of tokens.

Patent Agency Ranking