-
公开(公告)号:US20250078812A1
公开(公告)日:2025-03-06
申请号:US18794773
申请日:2024-08-05
Applicant: GOOGLE LLC
Inventor: Yonghui Xiao , Françoise Beaufays , Yuxin Ding
IPC: G10L15/06 , G06N3/098 , G10L15/183 , G10L15/30
Abstract: Implementations described herein are directed to a framework for decentralized learning of large global machine learning (ML) model(s). In various implementations, remote processor(s) of a remote system can identify a global ML model, select client devices to participate in a given round of decentralized learning of the global ML model, and transmit, to each of the client devices, a processed version of the global ML model that is of a reduced transferrable size. Further, client device processor(s) of a client device can receive the processed version of the global ML model, obtain corresponding client data, perform partial model training, based on processing the corresponding client data, for the processed version of the global ML model to generate a corresponding update, and transmit the corresponding update back to the remote system. Moreover, the remote processor(s) can update, based on at least the corresponding update, the global ML model.
-
公开(公告)号:US20240233707A9
公开(公告)日:2024-07-11
申请号:US18488578
申请日:2023-10-17
Applicant: Google LLC
Inventor: Tien-Ju Yang , You-Chi Cheng , Shankar Kumar , Jared Lichtarge , Ehsan Amid , Yuxin Ding , Rajiv Mathews , Mingqing Chen
IPC: G10L15/06 , G10L15/197 , G10L15/30
CPC classification number: G10L15/063 , G10L15/197 , G10L15/30 , G10L2015/0635
Abstract: A method includes receiving distillation data including a plurality of out-of-domain training utterances. For each particular out-of-domain training utterance of the distillation data, the method includes generating a corresponding augmented out-of-domain training utterance, and generating, using a teacher ASR model trained on training data corresponding to a target domain, a pseudo-label corresponding to the corresponding augmented out-of-domain training utterance. The method also includes distilling a student ASR model from the teacher ASR model by training the student ASR model using the corresponding augmented out-of-domain training utterances paired with the corresponding pseudo-labels generated by the teacher ASR model.
-
公开(公告)号:US20240135918A1
公开(公告)日:2024-04-25
申请号:US18488578
申请日:2023-10-16
Applicant: Google LLC
Inventor: Tien-Ju Yang , You-Chi Cheng , Shankar Kumar , Jared Lichtarge , Ehsan Amid , Yuxin Ding , Rajiv Mathews , Mingqing Chen
IPC: G10L15/06 , G10L15/197 , G10L15/30
CPC classification number: G10L15/063 , G10L15/197 , G10L15/30 , G10L2015/0635
Abstract: A method includes receiving distillation data including a plurality of out-of-domain training utterances. For each particular out-of-domain training utterance of the distillation data, the method includes generating a corresponding augmented out-of-domain training utterance, and generating, using a teacher ASR model trained on training data corresponding to a target domain, a pseudo-label corresponding to the corresponding augmented out-of-domain training utterance. The method also includes distilling a student ASR model from the teacher ASR model by training the student ASR model using the corresponding augmented out-of-domain training utterances paired with the corresponding pseudo-labels generated by the teacher ASR model.
-
公开(公告)号:US20240386318A1
公开(公告)日:2024-11-21
申请号:US18386431
申请日:2023-11-02
Applicant: GOOGLE LLC
Inventor: Yuxin Ding , Lillian Zhou , Mingqing Chen , Rajiv Mathews , Andrew Hard , Sean Augenstein
IPC: G06N20/00
Abstract: Implementations described herein are directed to techniques for mitigating and/or eliminating catastrophic forgetting of a global machine learning (ML) model during decentralized learning thereof. Remote processor(s) of a remote system can initially train a global ML model based on server data that is accessible by the remote system. In subsequent decentralized learning of the global ML model, the remote processor(s) can utilize various checkpoint averaging techniques. As described herein, these various checkpoint averaging techniques can include, but are not limited to, a static checkpoint averaging technique, a dynamic checkpoint averaging techniques, and/or a mixed centralized and decentralized training technique.
-
5.
公开(公告)号:US20240194192A1
公开(公告)日:2024-06-13
申请号:US18078782
申请日:2022-12-09
Applicant: GOOGLE LLC
Inventor: Ehsan Amid , Rajiv Mathews , Shankar Kumar , Jared Lichtarge , Mingqing Chen , Tien-Ju Yang , Yuxin Ding
CPC classification number: G10L15/16 , G10L15/063
Abstract: Information can be distilled from a global automatic speech recognition (ASR) model to a client ASR model. Many implementations include using an RNN-T model as the ASR model, where the global ASR model includes a global encoder, a joint network, a prediction network, and where the client ASR model includes a client encoder, the joint network, and the prediction network. Various implementations include using principal component analysis (PCA) while training the global ASR model to learn a mean vector and a set of principal components corresponding to the global ASR model. Additional or alternative implementations include training the client ASR model to generate one or more predicted coefficients of the global ASR model.
-
-
-
-