SYSTEMS AND METHODS FOR MACHINE-LEARNED MODELS HAVING CONVOLUTION AND ATTENTION

    公开(公告)号:WO2022251602A1

    公开(公告)日:2022-12-01

    申请号:PCT/US2022/031304

    申请日:2022-05-27

    Applicant: GOOGLE LLC

    Abstract: A computer-implemented method for performing computer vision with reduced computational cost and improved accuracy can include obtaining, by a computing system including one or more computing devices, input data comprising an input tensor having one or more dimensions, providing, by the computing system, the input data to a machine-learned convolutional attention network, the machine-learned convolutional attention network including two or more network stages, and, in response to providing the input data to the machine-learned convolutional attention network, receiving, by the computing system, a machine-learning prediction from the machine-learned convolutional attention network. The convolutional attention network can include at least one attention block, wherein the attention block includes a relative attention mechanism, the relative attention mechanism including the sum of a static convolution kernel with an adaptive attention matrix. This provides for improved generalization, capacity, and efficiency of the convolutional attention network relative to some existing models.

    COMPOUND MODEL SCALING FOR NEURAL NETWORKS
    3.
    发明申请

    公开(公告)号:WO2020154536A1

    公开(公告)日:2020-07-30

    申请号:PCT/US2020/014839

    申请日:2020-01-23

    Applicant: GOOGLE LLC

    Abstract: A method for determining a final architecture for a neural network to perform a particular machine learning task is described. The method includes receiving a baseline architecture for the neural network, wherein the baseline architecture has a network width dimension, a network depth dimension, and a resolution dimension; receiving data defining a compound coefficient that controls extra computational resources used for scaling the baseline architecture; performing a search to determine a baseline width, depth and resolution coefficient that specify how to assign the extra computational resources to the network width, depth and resolution dimensions of the baseline architecture, respectively; determining a width, depth and resolution coefficient based on the baseline width, depth, and resolution coefficient and the compound coefficient; and generating the final architecture that scales the network width, network depth, and resolution dimensions of the baseline architecture based on the corresponding width, depth, and resolution coefficients.

    HARDWARE-AWARE PROGRESSIVE TRAINING OF MACHINE LEARNING MODELS

    公开(公告)号:WO2023059439A1

    公开(公告)日:2023-04-13

    申请号:PCT/US2022/044201

    申请日:2022-09-21

    Applicant: GOOGLE LLC

    Abstract: Aspects of the disclosure provide for hardware-aware progressive training of machine learning models. A training system trains a model in accordance with a training process and different values specified in a training schedule for both hardware-level and model-level performance settings. Hardware-level performance settings can cause hardware features of computing resources used to train the model to be enabled, disabled, or modified at various points during training. Model-level performance settings can take on a variety of values to adjust characteristics of the machine learning model being trained or of the training process, during different stages of training. The training system can identify and apply complementary values of hardware- and model-level performance settings to generate training schedules that improve model training speed at earlier stages of training, while improving model quality at later stages of training.

    SYSTEMS AND METHODS FOR PROGRESSIVE LEARNING FOR MACHINE-LEARNED MODELS TO OPTIMIZE TRAINING SPEED

    公开(公告)号:WO2022169521A1

    公开(公告)日:2022-08-11

    申请号:PCT/US2021/065448

    申请日:2021-12-29

    Applicant: GOOGLE LLC

    Abstract: Systems and methods of the present disclosure can include a computer-implemented method for efficient machine-learned model training. The method can include obtaining a plurality of training samples for a machine-learned model. The method can include, for one or more first training iterations, training, based at least in part on a first regularization magnitude configured to control a relative effect of one or more regularization techniques, the machine-learned model using one or more respective first training samples of the plurality of training samples. The method can include, for one or more second training iterations, training, based at least in part on a second regularization magnitude greater than the first regularization magnitude, the machine-learned model using one or more respective second training samples of the plurality of training samples.

    NEURAL ARCHITECTURE SCALING FOR HARDWARE ACCELERATORS

    公开(公告)号:WO2022154829A1

    公开(公告)日:2022-07-21

    申请号:PCT/US2021/043674

    申请日:2021-07-29

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including computer-readable media, for scaling neural network architectures on hardware accelerators. A method includes receiving training data and information specifying target computing resources, and performing using the training data, a neural architecture search over a search space to identify an architecture for a base neural network. A plurality of scaling parameter values for scaling the base neural network can be identified, which can include repeatedly selecting a plurality of candidate scaling parameter values, and determining a measure of performance for the base neural network scaled according to the plurality of candidate scaling parameter values, in accordance with a plurality of second objectives including a latency objective. An architecture for a scaled neural network can be determined using the architecture of the base neural network scaled according to the plurality of scaling parameter values.

    NEURAL ARCHITECTURE AND HARDWARE ACCELERATOR SEARCH

    公开(公告)号:WO2022072890A1

    公开(公告)日:2022-04-07

    申请号:PCT/US2021/053247

    申请日:2021-10-01

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for jointly determining neural network architectures and hardware accelerator architectures. In one aspect, a method includes: generating, using a controller policy, a batch of one or more output sequences, each output sequence in the batch defining a respective architecture of a child neural network and a respective architecture of a hardware accelerator; for each output sequence in the batch: training a respective instance of the child neural network having the architecture defined by the output sequence; evaluating a network performance of the trained instance of the child neural; and evaluating an accelerator performance of a respective instance of the hardware accelerator having the architecture defined by the output sequence to determine an accelerator performance metric for the instance of the hardware accelerator; and using the network performance metrics and the accelerator performance metrics to adjust the controller policy.

    CONNECTION WEIGHT LEARNING FOR GUIDED ARCHITECTURE EVOLUTION

    公开(公告)号:WO2020237168A1

    公开(公告)日:2020-11-26

    申请号:PCT/US2020/034267

    申请日:2020-05-22

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining one or more neural network architectures of a neural network for performing a video processing neural network task. In one aspect, a method comprises: at each of a plurality of iterations: selecting a parent neural network architecture from a set of neural network architectures; training a neural network having the parent neural network architecture to perform the video processing neural network task, comprising determining trained values of connection weight parameters of the parent neural network architecture; generating a new neural network architecture based at least in part on the trained values of the connection weight parameters of the parent neural network architecture; and adding the new neural network architecture to the set of neural network architectures.

Patent Agency Ranking