Self-Adaptive Distillation
    12.
    发明申请

    公开(公告)号:US20220309340A1

    公开(公告)日:2022-09-29

    申请号:US17544570

    申请日:2021-12-07

    Applicant: Google LLC

    Abstract: A method for distilling one or more trained teacher automatic speech recognition (ASR) models into a multilingual student model includes receiving a plurality of teacher training examples and a plurality of student training examples. The method also includes training one or more teacher automatic speech recognition (ASR) models using the plurality of teacher training examples. Each teacher ASR model is configured to output a respective textual representation of a respective audio input. The method further includes generating a multi-lingual student ASR model by training the multi-lingual student ASR model using the plurality of student training examples and distilling the trained one or more teacher ASR models into the multilingual student ASR model using a tunable distillation loss weight. Each student ASR model is configured to receive an audio input and output a corresponding textual representation of the received audio input.

    Multilingual re-scoring models for automatic speech recognition

    公开(公告)号:US12080283B2

    公开(公告)日:2024-09-03

    申请号:US17701635

    申请日:2022-03-22

    Applicant: Google LLC

    CPC classification number: G10L15/197 G10L15/005 G10L15/16 G10L15/22

    Abstract: A method includes receiving a sequence of acoustic frames extracted from audio data corresponding to an utterance. During a first pass, the method includes processing the sequence of acoustic frames to generate N candidate hypotheses for the utterance. During a second pass, and for each candidate hypothesis, the method includes: generating a respective un-normalized likelihood score; generating a respective external language model score; generating a standalone score that models prior statistics of the corresponding candidate hypothesis; and generating a respective overall score for the candidate hypothesis based on the un-normalized likelihood score, the external language model score, and the standalone score. The method also includes selecting the candidate hypothesis having the highest respective overall score from among the N candidate hypotheses as a final transcription of the utterance.

    Multilingual Re-Scoring Models for Automatic Speech Recognition

    公开(公告)号:US20240203409A1

    公开(公告)日:2024-06-20

    申请号:US18589220

    申请日:2024-02-27

    Applicant: Google LLC

    CPC classification number: G10L15/197 G10L15/005 G10L15/16 G10L15/22

    Abstract: A method includes receiving a sequence of acoustic frames extracted from audio data corresponding to an utterance. During a first pass, the method includes processing the sequence of acoustic frames to generate N candidate hypotheses for the utterance. During a second pass, and for each candidate hypothesis, the method includes: generating a respective un-normalized likelihood score; generating a respective external language model score; generating a standalone score that models prior statistics of the corresponding candidate hypothesis; and generating a respective overall score for the candidate hypothesis based on the un-normalized likelihood score, the external language model score, and the standalone score. The method also includes selecting the candidate hypothesis having the highest respective overall score from among the N candidate hypotheses as a final transcription of the utterance.

Patent Agency Ranking