CHUNK-WISE ATTENTION FOR LONGFORM ASR
    15.
    发明公开

    公开(公告)号:US20240290321A1

    公开(公告)日:2024-08-29

    申请号:US18585168

    申请日:2024-02-23

    Applicant: Google LLC

    CPC classification number: G10L15/063 G10L15/26

    Abstract: A method includes receiving training data including a corpus of multilingual unspoken textual utterances, a corpus of multilingual un-transcribed non-synthetic speech utterances, and a corpus of multilingual transcribed non-synthetic speech utterances. For each un-transcribed non-synthetic speech utterance, the method includes generating a target quantized vector token and a target token index, generating contrastive context vectors from corresponding masked audio features, and deriving a contrastive loss term. The method also includes generating an alignment output, generating a first probability distribution over possible speech recognition hypotheses for the alignment output, and determining an alignment output loss term. The method also includes generating a second probability distribution over possible speech recognition hypotheses and determining a non-synthetic speech loss term. The method also includes pre-training an audio encoder based on the contrastive loss term, the alignment output loss term, and the non-synthetic speech loss term.

    Streaming End-to-end Multilingual Speech Recognition with Joint Language Identification

    公开(公告)号:US20230306958A1

    公开(公告)日:2023-09-28

    申请号:US18188632

    申请日:2023-03-23

    Applicant: Google LLC

    CPC classification number: G10L15/005 G10L15/16 G10L15/063

    Abstract: A method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. The method also includes generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a language identification (ID) predictor, a language prediction representation based on a concatenation of the first higher order feature representation and the second higher order feature representation. The method also includes generating, by a first decoder, a first probability distribution over possible speech recognition hypotheses based on a concatenation of the second higher order feature representation and the language prediction representation.

    Self-Adaptive Distillation
    20.
    发明申请

    公开(公告)号:US20220309340A1

    公开(公告)日:2022-09-29

    申请号:US17544570

    申请日:2021-12-07

    Applicant: Google LLC

    Abstract: A method for distilling one or more trained teacher automatic speech recognition (ASR) models into a multilingual student model includes receiving a plurality of teacher training examples and a plurality of student training examples. The method also includes training one or more teacher automatic speech recognition (ASR) models using the plurality of teacher training examples. Each teacher ASR model is configured to output a respective textual representation of a respective audio input. The method further includes generating a multi-lingual student ASR model by training the multi-lingual student ASR model using the plurality of student training examples and distilling the trained one or more teacher ASR models into the multilingual student ASR model using a tunable distillation loss weight. Each student ASR model is configured to receive an audio input and output a corresponding textual representation of the received audio input.

Patent Agency Ranking