-
公开(公告)号:US20230222318A1
公开(公告)日:2023-07-13
申请号:US18009841
申请日:2021-06-30
Applicant: Google LLC
Inventor: Dmitry Lepikhin , Yanping Huang , Orhan Firat , Maxim Krikun , Dehao Chen , Noam M. Shazeer , HyoukJoong Lee , Yuanzhong Xu , Zhifeng Chen
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing machine learning task on a network input to generate a network output. In one aspect, one of the systems includes an attention neural network configured to perform the machine learning task, the attention neural network including one or more attention layers, each attention layer comprising an attention sub-layer and a feed-forward sub-layer. Some or all of the attention layers have a feed-forward sub-layer that applies conditional computation to the inputs to the sub-layer.
-
公开(公告)号:US12242948B2
公开(公告)日:2025-03-04
申请号:US17159437
申请日:2021-01-27
Applicant: Google LLC
Inventor: Yanping Huang , Dmitry Lepikhin , Maxim Krikun , Orhan Firat , Ankur Bapna , Thang Luong , Sneha Kudugunta
Abstract: Systems and methods for routing in mixture-of-expert models. In some aspects of the technology, a transformer may have at least one Mixture-of-Experts (“MoE”) layer in each of its encoder and decoder, with the at least one MoE layer of the encoder having a learned gating function configured to route each token of a task to two or more selected expert feed-forward networks, and the at least one MoE layer of the decoder having a learned gating function configured to route each task to two or more selected expert feed-forward networks.
-
公开(公告)号:US20240378441A1
公开(公告)日:2024-11-14
申请号:US18661447
申请日:2024-05-10
Applicant: Google LLC
Inventor: Slav Petrov , Yonghui Wu , Andrew M. Dai , David Richard So , Dmitry Lepikhin , Erica Ann Moreira , Gaurav Mishra , Jonathan Hudson Clark , Maxim Krikun , Melvin Jose Johnson Premkumar , Nan Du , Orhan Firat , Rohan Anil , Siamak Shakeri , Xavier Garcia , Yanping Huang , Yong Cheng , Yuanzhong Xu , Yujing Zhang , Zachary Alexander Nado , Eric Jun Jie Ni , Kefan Xiao , Vladimir Feinberg , Jin Young Sohn , Aurko Roy
IPC: G06N3/08
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network to perform any one or more of a variety of machine learning tasks. For example, the neural network can be configured as a generative neural network, e.g., an autoregressive generative neural network.
-
公开(公告)号:US20240378427A1
公开(公告)日:2024-11-14
申请号:US18661499
申请日:2024-05-10
Applicant: Google LLC
Inventor: Slav Petrov , Yonghui Wu , Andrew M. Dai , David Richard So , Dmitry Lepikhin , Erica Ann Moreira , Gaurav Mishra , Jonathan Hudson Clark , Maxim Krikun , Melvin Jose Johnson Premkumar , Nan Du , Orhan Firat , Rohan Anil , Siamak Shakeri , Xavier Garcia , Yanping Huang , Yong Cheng , Yuanzhong Xu , Yujing Zhang , Zachary Alexander Nado , Eric Jun Jie Ni , Kefan Xiao , Vladimir Feinberg , Jin Young Sohn , Aurko Roy
IPC: G06N3/0475 , G06F40/284
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network to perform any one or more of a variety of machine learning tasks. For example, the neural network can be configured as a generative neural network, e.g., an autoregressive generative neural network.
-
公开(公告)号:US20220237435A1
公开(公告)日:2022-07-28
申请号:US17159437
申请日:2021-01-27
Applicant: Google LLC
Inventor: Yanping Huang , Dmitry Lepikhin , Maxim Krikun , Orhan Firat , Ankur Bapna , Thang Luong , Sneha Kudugunta
Abstract: Systems and methods for routing in mixture-of-expert models. In some aspects of the technology, a transformer may have at least one Mixture-of-Experts (“MoE”) layer in each of its encoder and decoder, with the at least one MoE layer of the encoder having a learned gating function configured to route each token of a task to two or more selected expert feed-forward networks, and the at least one MoE layer of the decoder having a learned gating function configured to route each task to two or more selected expert feed-forward networks.
-
-
-
-