-
公开(公告)号:US20230022151A1
公开(公告)日:2023-01-26
申请号:US17860691
申请日:2022-07-08
Applicant: Google LLC
Inventor: Hanjun Dai , Bo Dai , Hongyu Ren , Dale Eric Schuurmans , Zihang Dai , Mengjiao Yang
Abstract: The present disclosure is directed to machine learning model architectures which provide full attention capability in each attention head while maintaining low computation and memory complexity. Specifically, according to one aspect of the present disclosure, example attention models provided herein can treat the self-attention mechanism as a conditional expectation over embeddings at each location and approximate the conditional distribution with a structured factorization. Each location can attend to all other locations, either via direct attention, or through indirect attention to group representations, which are again conditional expectations of embeddings from corresponding local regions.
-
公开(公告)号:US20220383119A1
公开(公告)日:2022-12-01
申请号:US17827362
申请日:2022-05-27
Applicant: Google LLC
Inventor: David Richard So , Quoc V. Le, Jr. , Hanxiao Liu , Wojciech Andrzej Manke , Zihang Dai , Noam M. Shazeer
IPC: G06N3/08
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing a machine learning task on a network input to generate a network output. One of the systems includes an attention neural network configured to perform the machine learning task. The attention neural network includes one or more attentions layers that each include a squared ReLU activation layer, a depth-wise convolution layer, or both.
-
公开(公告)号:US20220188636A1
公开(公告)日:2022-06-16
申请号:US17551065
申请日:2021-12-14
Applicant: Google LLC
Inventor: Hieu Hy Pham , Zihang Dai , Qizhe Xie , Quoc V. Le
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network using meta pseudo-labels. One of the methods includes training a student neural network using pseudo-labels generated by a teacher neural network that is being trained jointly with the student neural network.
-
-