-
公开(公告)号:US20230112862A1
公开(公告)日:2023-04-13
申请号:US17960380
申请日:2022-10-05
Applicant: Google LLC
Inventor: Venkata S. Bhojanapalli , Andreas Veit , Ayan Chakrabarti , Frederick Liu , Himanshu Jain , Michal Lukasik , Sanjiv Kumar , Yin-Wen Chang
IPC: G06N3/04
Abstract: Provided are systems and methods that improve the computational efficiency of Transformers or other attention-based neural networks or machine learning models by re-using a number of attention scores between layers and/or heads of the model. To reduce the computational cost of self-attention-based models while achieving comparable or even superior results, example aspects of the present disclosure propose a novel architecture that reuses attention scores computed in one layer in one or multiple subsequent layers.