NEURAL ARCHITECTURE SCALING FOR HARDWARE ACCELERATORS

    公开(公告)号:WO2022154829A1

    公开(公告)日:2022-07-21

    申请号:PCT/US2021/043674

    申请日:2021-07-29

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including computer-readable media, for scaling neural network architectures on hardware accelerators. A method includes receiving training data and information specifying target computing resources, and performing using the training data, a neural architecture search over a search space to identify an architecture for a base neural network. A plurality of scaling parameter values for scaling the base neural network can be identified, which can include repeatedly selecting a plurality of candidate scaling parameter values, and determining a measure of performance for the base neural network scaled according to the plurality of candidate scaling parameter values, in accordance with a plurality of second objectives including a latency objective. An architecture for a scaled neural network can be determined using the architecture of the base neural network scaled according to the plurality of scaling parameter values.

    HARDWARE-AWARE PROGRESSIVE TRAINING OF MACHINE LEARNING MODELS

    公开(公告)号:WO2023059439A1

    公开(公告)日:2023-04-13

    申请号:PCT/US2022/044201

    申请日:2022-09-21

    Applicant: GOOGLE LLC

    Abstract: Aspects of the disclosure provide for hardware-aware progressive training of machine learning models. A training system trains a model in accordance with a training process and different values specified in a training schedule for both hardware-level and model-level performance settings. Hardware-level performance settings can cause hardware features of computing resources used to train the model to be enabled, disabled, or modified at various points during training. Model-level performance settings can take on a variety of values to adjust characteristics of the machine learning model being trained or of the training process, during different stages of training. The training system can identify and apply complementary values of hardware- and model-level performance settings to generate training schedules that improve model training speed at earlier stages of training, while improving model quality at later stages of training.

Patent Agency Ranking