-
公开(公告)号:US12242948B2
公开(公告)日:2025-03-04
申请号:US17159437
申请日:2021-01-27
Applicant: Google LLC
Inventor: Yanping Huang , Dmitry Lepikhin , Maxim Krikun , Orhan Firat , Ankur Bapna , Thang Luong , Sneha Kudugunta
Abstract: Systems and methods for routing in mixture-of-expert models. In some aspects of the technology, a transformer may have at least one Mixture-of-Experts (“MoE”) layer in each of its encoder and decoder, with the at least one MoE layer of the encoder having a learned gating function configured to route each token of a task to two or more selected expert feed-forward networks, and the at least one MoE layer of the decoder having a learned gating function configured to route each task to two or more selected expert feed-forward networks.
-
公开(公告)号:US20220237435A1
公开(公告)日:2022-07-28
申请号:US17159437
申请日:2021-01-27
Applicant: Google LLC
Inventor: Yanping Huang , Dmitry Lepikhin , Maxim Krikun , Orhan Firat , Ankur Bapna , Thang Luong , Sneha Kudugunta
Abstract: Systems and methods for routing in mixture-of-expert models. In some aspects of the technology, a transformer may have at least one Mixture-of-Experts (“MoE”) layer in each of its encoder and decoder, with the at least one MoE layer of the encoder having a learned gating function configured to route each token of a task to two or more selected expert feed-forward networks, and the at least one MoE layer of the decoder having a learned gating function configured to route each task to two or more selected expert feed-forward networks.
-