-
公开(公告)号:US20230017505A1
公开(公告)日:2023-01-19
申请号:US17375960
申请日:2021-07-14
Applicant: Google LLC
Inventor: Aditya Krishna Menon , Sanjiv Kumar , Himanshu Jain , Andreas Veit , Ankit Singh Rawat , Gayan Sadeep Jayasumana Hirimbura Matara Kankanamge
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for accounting for long-tail training data.
-
公开(公告)号:US20230112862A1
公开(公告)日:2023-04-13
申请号:US17960380
申请日:2022-10-05
Applicant: Google LLC
Inventor: Venkata S. Bhojanapalli , Andreas Veit , Ayan Chakrabarti , Frederick Liu , Himanshu Jain , Michal Lukasik , Sanjiv Kumar , Yin-Wen Chang
IPC: G06N3/04
Abstract: Provided are systems and methods that improve the computational efficiency of Transformers or other attention-based neural networks or machine learning models by re-using a number of attention scores between layers and/or heads of the model. To reduce the computational cost of self-attention-based models while achieving comparable or even superior results, example aspects of the present disclosure propose a novel architecture that reuses attention scores computed in one layer in one or multiple subsequent layers.
-
公开(公告)号:US20240005131A1
公开(公告)日:2024-01-04
申请号:US18343723
申请日:2023-06-28
Applicant: Google LLC
IPC: G06N3/0455
CPC classification number: G06N3/0455
Abstract: Systems and methods for processing inputs using attention neural networks with tree attention layers. Each tree attention layer includes one or more tree attention sub-layers that are each configured to: process query vectors using a decision tree model for the tree attention sub-layer to determine a respective tree path for each query vector; process key vectors using the decision tree model to determine a respective tree path for each key vector; and generate an attended input sequence comprising a respective attended input at each of the plurality of input positions, comprising: generating, for each particular input position, the respective attended input at the particular input position based on (i) the tree path for the query vector at the particular input position (ii) the respective tree paths for the key vectors at each of the plurality of input positions and (iii) the value vectors at a subset of the input positions.
-
-