TRAINING ULTRA-LARGE-SCALE VISION TRANSFORMER NEURAL NETWORKS

    公开(公告)号:US20240256835A1

    公开(公告)日:2024-08-01

    申请号:US18424420

    申请日:2024-01-26

    Applicant: Google LLC

    CPC classification number: G06N3/0455 G06N3/088

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing an input through each of a plurality of layers of a neural network to generate an output using a plurality of hardware accelerators. The plurality of layers comprise a fully connected layer having a plurality of parameters arranged in a row dimension and a column dimension. One of the methods comprises: generating a plurality of parameter blocks by partitioning the plurality of parameters along the row dimension and the column dimension; determining a ratio of a number of parameters along the row dimension relative to a number of parameters along the column dimension; and determining whether to use row sharding or column sharding with the plurality of hardware accelerators to calculate an output for the fully connected layer and then calculating the output for the fully connected layer using either row sharding or column sharding.

Patent Agency Ranking