Full Attention with Sparse Computation Cost

    公开(公告)号:US20230022151A1

    公开(公告)日:2023-01-26

    申请号:US17860691

    申请日:2022-07-08

    Applicant: Google LLC

    Abstract: The present disclosure is directed to machine learning model architectures which provide full attention capability in each attention head while maintaining low computation and memory complexity. Specifically, according to one aspect of the present disclosure, example attention models provided herein can treat the self-attention mechanism as a conditional expectation over embeddings at each location and approximate the conditional distribution with a structured factorization. Each location can attend to all other locations, either via direct attention, or through indirect attention to group representations, which are again conditional expectations of embeddings from corresponding local regions.

    META PSEUDO-LABELS
    13.
    发明申请

    公开(公告)号:US20220188636A1

    公开(公告)日:2022-06-16

    申请号:US17551065

    申请日:2021-12-14

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network using meta pseudo-labels. One of the methods includes training a student neural network using pseudo-labels generated by a teacher neural network that is being trained jointly with the student neural network.

Patent Agency Ranking