-
公开(公告)号:US11842281B2
公开(公告)日:2023-12-12
申请号:US17183618
申请日:2021-02-24
Applicant: DeepMind Technologies Limited
Inventor: Volodymyr Mnih , Wojciech Czarnecki , Maxwell Elliot Jaderberg , Tom Schaul , David Silver , Koray Kavukcuoglu
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.
-
公开(公告)号:US11715009B2
公开(公告)日:2023-08-01
申请号:US16303595
申请日:2017-05-19
Applicant: DEEPMIND TECHNOLOGIES LIMITED
Inventor: Oriol Vinyals , Alexander Benjamin Graves , Wojciech Czarnecki , Koray Kavukcuoglu , Simon Osindero , Maxwell Elliot Jaderberg
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network including a first subnetwork followed by a second subnetwork on training inputs by optimizing an objective function. In one aspect, a method includes processing a training input using the neural network to generate a training model output, including processing a subnetwork input for the training input using the first subnetwork to generate a subnetwork activation for the training input in accordance with current values of parameters of the first subnetwork, and providing the subnetwork activation as input to the second subnetwork; determining a synthetic gradient of the objective function for the first subnetwork by processing the subnetwork activation using a synthetic gradient model in accordance with current values of parameters of the synthetic gradient model; and updating the current values of the parameters of the first subnetwork using the synthetic gradient.
-
3.
公开(公告)号:US20210150355A1
公开(公告)日:2021-05-20
申请号:US17159961
申请日:2021-01-27
Applicant: DeepMind Technologies Limited
Inventor: Marc Gendron-Bellemare , Jacob Lee Menick , Alexander Benjamin Graves , Koray Kavukcuoglu , Remi Munos
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model. In one aspect, a method includes receiving training data for training the machine learning model on a plurality of tasks, where each task includes multiple batches of training data. A task is selected in accordance with a current task selection policy. A batch of training data is selected from the selected task. The machine learning model is trained on the selected batch of training data to determine updated values of the model parameters. A learning progress measure that represents a progress of the training of the machine learning model as a result of training the machine learning model on the selected batch of training data is determined. The current task selection policy is updated using the learning progress measure.
-
公开(公告)号:US20190332938A1
公开(公告)日:2019-10-31
申请号:US16508042
申请日:2019-07-10
Applicant: DeepMind Technologies Limited
Inventor: Marc Gendron-Bellemare , Jacob Lee Menick , Alexander Benjamin Graves , Koray Kavukcuoglu , Remi Munos
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model. In one aspect, a method includes receiving training data for training the machine learning model on a plurality of tasks, where each task includes multiple batches of training data. A task is selected in accordance with a current task selection policy. A batch of training data is selected from the selected task. The machine learning model is trained on the selected batch of training data to determine updated values of the model parameters. A learning progress measure that represents a progress of the training of the machine learning model as a result of training the machine learning model on the selected batch of training data is determined. The current task selection policy is updated using the learning progress measure.
-
公开(公告)号:US20180260708A1
公开(公告)日:2018-09-13
申请号:US15977923
申请日:2018-05-11
Applicant: DeepMind Technologies Limited
Inventor: Volodymyr Mnih , Adrià Puigdomènech Badia , Alexander Benjamin Graves , Timothy James Alexander Harley , David Silver , Koray Kavukcuoglu
CPC classification number: G06N3/08 , G06N3/04 , G06N3/0454
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.
-
公开(公告)号:US20240354566A1
公开(公告)日:2024-10-24
申请号:US18623952
申请日:2024-04-01
Applicant: DeepMind Technologies Limited
Inventor: Koray Kavukcuoglu , Aaron Gerard Antonius van den Oord , Oriol Vinyals
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating discrete latent representations of input data items. One of the methods includes receiving an input data item; providing the input data item as input to an encoder neural network to obtain an encoder output for the input data item; and generating a discrete latent representation of the input data item from the encoder output, comprising: for each of the latent variables, determining, from a set of latent embedding vectors in the memory, a latent embedding vector that is nearest to the encoded vector for the latent variable.
-
公开(公告)号:US20240144015A1
公开(公告)日:2024-05-02
申请号:US18386954
申请日:2023-11-03
Applicant: DeepMind Technologies Limited
Inventor: Volodymyr Mnih , Wojciech Czarnecki , Maxwell Elliot Jaderberg , Tom Schaul , David Silver , Koray Kavukcuoglu
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.
-
公开(公告)号:US20240119262A1
公开(公告)日:2024-04-11
申请号:US18479775
申请日:2023-10-02
Applicant: DeepMind Technologies Limited
Inventor: Neil Charles Rabinowitz , Guillaume Desjardins , Andrei-Alexandru Rusu , Koray Kavukcuoglu , Raia Thais Hadsell , Razvan Pascanu , James Kirkpatrick , Hubert Josef Soyer
Abstract: Methods and systems for performing a sequence of machine learning tasks. One system includes a sequence of deep neural networks (DNNs), including: a first DNN corresponding to a first machine learning task, wherein the first DNN comprises a first plurality of indexed layers, and each layer in the first plurality of indexed layers is configured to receive a respective layer input and process the layer input to generate a respective layer output; and one or more subsequent DNNs corresponding to one or more respective machine learning tasks, wherein each subsequent DNN comprises a respective plurality of indexed layers, and each layer in a respective plurality of indexed layers with index greater than one receives input from a preceding layer of the respective subsequent DNN, and one or more preceding layers of respective preceding DNNs, wherein a preceding layer is a layer whose index is one less than the current index.
-
公开(公告)号:US11948075B2
公开(公告)日:2024-04-02
申请号:US16620815
申请日:2018-06-11
Applicant: DEEPMIND TECHNOLOGIES LIMITED
Inventor: Koray Kavukcuoglu , Aaron Gerard Antonius van den Oord , Oriol Vinyals
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating discrete latent representations of input data items. One of the methods includes receiving an input data item; providing the input data item as input to an encoder neural network to obtain an encoder output for the input data item; and generating a discrete latent representation of the input data item from the encoder output, comprising: for each of the latent variables, determining, from a set of latent embedding vectors in the memory, a latent embedding vector that is nearest to the encoded vector for the latent variable.
-
10.
公开(公告)号:US11868894B2
公开(公告)日:2024-01-09
申请号:US18149771
申请日:2023-01-04
Applicant: DeepMind Technologies Limited
Inventor: Hubert Josef Soyer , Lasse Espeholt , Karen Simonyan , Yotam Doron , Vlad Firoiu , Volodymyr Mnih , Koray Kavukcuoglu , Remi Munos , Thomas Ward , Timothy James Alexander Harley , Iain Robert Dunning
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.
-
-
-
-
-
-
-
-
-