-
公开(公告)号:US20250111210A1
公开(公告)日:2025-04-03
申请号:US18900531
申请日:2024-09-27
Applicant: Google LLC
Inventor: Chong You , Guru Guruganesh , Joshua Timothy Ainslie , Manzil Zaheer , Sanjiv Kumar , Santiago Ontañón , Shanda Li , Venkata Sesha Pavana Srinadh Bhojanapalli , Sumit Sanghai
IPC: G06N3/0475
Abstract: Systems and methods for processing inputs using attention neural networks. In particular, one or more of the attention layers within the attention neural network compute relative position biases using functional interpolation.
-
公开(公告)号:US20240386256A1
公开(公告)日:2024-11-21
申请号:US18318049
申请日:2023-05-16
Applicant: Google LLC
Inventor: James Lee Thorp , Joshua Timothy Ainslie
IPC: G06N3/0499
Abstract: Improved multi-layer machine learning model architectures are provided that exhibit increased accuracy, decreased training time, decreased inference compute cost, and/or increased stability while training. These improved models include a plurality of sequential layers, each layer comprising a mixing layer that feeds into a feedforward layer. These improved models achieve these benefits by ‘enhancing’ a subset of the feedforward layers with mixture-of-experts or other sparse multi-network architectures while ‘degrading’ a subset of the mixing layers to be simple linear mixing layers (e.g., that multiply inputs by one or more mixing matrices) rather than more complicated attentional mixing mechanisms (e.g., including a number of matrix multiplications, dot products, and nonlinear operations). Such a combination of mixing layer modifications and feedforward layer modifications in a single multi-layer model exhibits synergistic improvements with respect to training time, inference computational cost, and training stability for a given level of model accuracy.
-
公开(公告)号:US11238332B2
公开(公告)日:2022-02-01
申请号:US17341193
申请日:2021-06-07
Applicant: Google LLC
Inventor: Joshua Timothy Ainslie , Santiago Ontañón , Philip Pham , Manzil Zaheer , Guru Guruganesh , Kumar Avinava Dubey , Amr Ahmed
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing network inputs using an attention neural network that has one or more sparse attention sub-layers. Each sparse attention sub-layer is configured to apply a sparse attention mechanism that attends differently for input positions that are in a first proper subset of the input positions in the input to the sub-layer than for positions that are not in the first proper subset.
-
公开(公告)号:US20230077928A1
公开(公告)日:2023-03-16
申请号:US17474928
申请日:2021-09-14
Applicant: Google LLC
Inventor: James Patrick Lee-Thorp , Joshua Timothy Ainslie , Ilya Eckstein , Santiago Ontañón
Abstract: Transformer systems and methods of using such transformer systems including computer programs encoded on a computer storage medium, for performing a deep learning task on an input sequence to generate an encoded output. In one aspect, one of the transformer systems includes an encoder architecture block, comprising: a spectral transform mixing layer that receives input embeddings of input tokens and generates, as output, a spectral transform output along a sequence dimension of the input embeddings; and a feed forward layer that receives an input based on the input embeddings of input tokens and the spectral transform output and generates an output for a subsequent processing block.
-
公开(公告)号:US20220156553A1
公开(公告)日:2022-05-19
申请号:US17589542
申请日:2022-01-31
Applicant: Google LLC
Inventor: Joshua Timothy Ainslie , Santiago Ontañón , Philip Pham , Manzil Zaheer , Guru Guruganesh , Kumar Avinava Dubey , Amr Ahmed
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing network inputs using an attention neural network that has one or more sparse attention sub-layers. Each sparse attention sub-layer is configured to apply a sparse attention mechanism that attends differently for input positions that are in a first proper subset of the input positions in the input to the sub-layer than for positions that are not in the first proper subset.
-
公开(公告)号:US20210383191A1
公开(公告)日:2021-12-09
申请号:US17341193
申请日:2021-06-07
Applicant: Google LLC
Inventor: Joshua Timothy Ainslie , Santiago Ontañón , Philip Pham , Manzil Zaheer , Guru Guruganesh , Kumar Avinava Dubey , Amr Ahmed
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing network inputs using an attention neural network that has one or more sparse attention sub-layers. Each sparse attention sub-layer is configured to apply a sparse attention mechanism that attends differently for input positions that are in a first proper subset of the input positions in the input to the sub-layer than for positions that are not in the first proper subset.
-
-
-
-
-