CHUNK-WISE ATTENTION FOR LONGFORM ASR
    1.
    发明公开

    公开(公告)号:US20240290321A1

    公开(公告)日:2024-08-29

    申请号:US18585168

    申请日:2024-02-23

    Applicant: Google LLC

    CPC classification number: G10L15/063 G10L15/26

    Abstract: A method includes receiving training data including a corpus of multilingual unspoken textual utterances, a corpus of multilingual un-transcribed non-synthetic speech utterances, and a corpus of multilingual transcribed non-synthetic speech utterances. For each un-transcribed non-synthetic speech utterance, the method includes generating a target quantized vector token and a target token index, generating contrastive context vectors from corresponding masked audio features, and deriving a contrastive loss term. The method also includes generating an alignment output, generating a first probability distribution over possible speech recognition hypotheses for the alignment output, and determining an alignment output loss term. The method also includes generating a second probability distribution over possible speech recognition hypotheses and determining a non-synthetic speech loss term. The method also includes pre-training an audio encoder based on the contrastive loss term, the alignment output loss term, and the non-synthetic speech loss term.

Patent Agency Ranking