-
公开(公告)号:US20220253680A1
公开(公告)日:2022-08-11
申请号:US17665279
申请日:2022-02-04
Applicant: Google LLC
Inventor: Zhe Zhao , Maheswaran Sathiamoorthy , Lichan Hong , Yihua Chen , Ed Huai-hsin Chi , Aakanksha Chowdhery , Hussein Hazimeh
Abstract: A system including a main neural network for performing one or more machine learning tasks on a network input to generate one or more network outputs. The main neural network includes a Mixture of Experts (MoE) subnetwork that includes a plurality of expert neural networks and a gating subsystem. The gating subsystem is configured to: apply a softmax function to a set of gating parameters having learned values to generate a respective softmax score for each of one or more of the plurality of expert neural networks; determine a respective weight for each of the one or more of the plurality of expert neural networks; select a proper subset of the plurality of expert neural networks; and combine the respective expert outputs generated by the one or more expert neural networks in the proper subset to generate one or more MoE outputs.
-
公开(公告)号:US20240386280A1
公开(公告)日:2024-11-21
申请号:US18667973
申请日:2024-05-17
Applicant: Google LLC
Inventor: Zhe Zhao , Huan Gui , Qingyun Liu , Ed Huai-hsin Chi , Lichan Hong , Bang An
IPC: G06N3/096 , G06N3/0455
Abstract: A computer-implemented method to generate a second machine learning model based on a first machine learning model, wherein the second machine learning model is structured for more efficient computation, is provided. The method includes processing an input with a hidden layer of a student machine-learned model to obtain an intermediate output. The method includes providing an encoded message descriptive of the input and the intermediate output for processing with a teacher machine-learned model. The method includes, responsive to providing the encoded message, obtaining a second encoded message descriptive of a second intermediate output of one or more hidden layers of the teacher machine-learned model. The method includes performing a knowledge distillation training process to train the student machine-learned model based on a difference between the intermediate output and the second intermediate output.
-