Sparse Mixer Architecture
    2.
    发明申请

    公开(公告)号:US20240386256A1

    公开(公告)日:2024-11-21

    申请号:US18318049

    申请日:2023-05-16

    Applicant: Google LLC

    Abstract: Improved multi-layer machine learning model architectures are provided that exhibit increased accuracy, decreased training time, decreased inference compute cost, and/or increased stability while training. These improved models include a plurality of sequential layers, each layer comprising a mixing layer that feeds into a feedforward layer. These improved models achieve these benefits by ‘enhancing’ a subset of the feedforward layers with mixture-of-experts or other sparse multi-network architectures while ‘degrading’ a subset of the mixing layers to be simple linear mixing layers (e.g., that multiply inputs by one or more mixing matrices) rather than more complicated attentional mixing mechanisms (e.g., including a number of matrix multiplications, dot products, and nonlinear operations). Such a combination of mixing layer modifications and feedforward layer modifications in a single multi-layer model exhibits synergistic improvements with respect to training time, inference computational cost, and training stability for a given level of model accuracy.

    MIXING TOKENS WITH SPECTRAL TRANSFORM

    公开(公告)号:US20230077928A1

    公开(公告)日:2023-03-16

    申请号:US17474928

    申请日:2021-09-14

    Applicant: Google LLC

    Abstract: Transformer systems and methods of using such transformer systems including computer programs encoded on a computer storage medium, for performing a deep learning task on an input sequence to generate an encoded output. In one aspect, one of the transformer systems includes an encoder architecture block, comprising: a spectral transform mixing layer that receives input embeddings of input tokens and generates, as output, a spectral transform output along a sequence dimension of the input embeddings; and a feed forward layer that receives an input based on the input embeddings of input tokens and the spectral transform output and generates an output for a subsequent processing block.

Patent Agency Ranking