Modular Training for Flexible Attention Based End-to-End ASR

    公开(公告)号:US20240185839A1

    公开(公告)日:2024-06-06

    申请号:US18526148

    申请日:2023-12-01

    Applicant: Google LLC

    CPC classification number: G10L15/063 G10L2015/0635

    Abstract: A method for training a modular neural network model includes training only a backbone model to provide a first model configuration of the modular neural network model. The first model configuration includes only the trained backbone model. The method also includes adding an intrinsic sub-model to the trained backbone model. During a fine-tuning training stage, the method includes freezing parameters of the trained backbone model and fine-tuning parameters of the intrinsic sub-model added to the trained backbone model while the parameters of the trained backbone model are frozen to provide a second model configuration that includes the backbone model initially trained during the initial training stage and the intrinsic sub-model having the parameters fine-tuned during the fine-tuning stage.

    Self-Adaptive Distillation
    2.
    发明申请

    公开(公告)号:US20220309340A1

    公开(公告)日:2022-09-29

    申请号:US17544570

    申请日:2021-12-07

    Applicant: Google LLC

    Abstract: A method for distilling one or more trained teacher automatic speech recognition (ASR) models into a multilingual student model includes receiving a plurality of teacher training examples and a plurality of student training examples. The method also includes training one or more teacher automatic speech recognition (ASR) models using the plurality of teacher training examples. Each teacher ASR model is configured to output a respective textual representation of a respective audio input. The method further includes generating a multi-lingual student ASR model by training the multi-lingual student ASR model using the plurality of student training examples and distilling the trained one or more teacher ASR models into the multilingual student ASR model using a tunable distillation loss weight. Each student ASR model is configured to receive an audio input and output a corresponding textual representation of the received audio input.

Patent Agency Ranking