-
公开(公告)号:US20240185839A1
公开(公告)日:2024-06-06
申请号:US18526148
申请日:2023-12-01
Applicant: Google LLC
Inventor: Kartik AUDHKHASI , Bhuvana Ramabhadran , Brian Farris
IPC: G10L15/06
CPC classification number: G10L15/063 , G10L2015/0635
Abstract: A method for training a modular neural network model includes training only a backbone model to provide a first model configuration of the modular neural network model. The first model configuration includes only the trained backbone model. The method also includes adding an intrinsic sub-model to the trained backbone model. During a fine-tuning training stage, the method includes freezing parameters of the trained backbone model and fine-tuning parameters of the intrinsic sub-model added to the trained backbone model while the parameters of the trained backbone model are frozen to provide a second model configuration that includes the backbone model initially trained during the initial training stage and the intrinsic sub-model having the parameters fine-tuned during the fine-tuning stage.
-
公开(公告)号:US20220309340A1
公开(公告)日:2022-09-29
申请号:US17544570
申请日:2021-12-07
Applicant: Google LLC
Inventor: Isabel Leal , Neeraj Gaur , Parisa Haghani , Brian Farris , Bhuvana Ramabhadran , Manasa Prasad , Pedro J. Moreno Mengibar , Yun Zhu
Abstract: A method for distilling one or more trained teacher automatic speech recognition (ASR) models into a multilingual student model includes receiving a plurality of teacher training examples and a plurality of student training examples. The method also includes training one or more teacher automatic speech recognition (ASR) models using the plurality of teacher training examples. Each teacher ASR model is configured to output a respective textual representation of a respective audio input. The method further includes generating a multi-lingual student ASR model by training the multi-lingual student ASR model using the plurality of student training examples and distilling the trained one or more teacher ASR models into the multilingual student ASR model using a tunable distillation loss weight. Each student ASR model is configured to receive an audio input and output a corresponding textual representation of the received audio input.
-