-
公开(公告)号:US11775804B2
公开(公告)日:2023-10-03
申请号:US17201542
申请日:2021-03-15
Applicant: DeepMind Technologies Limited
Inventor: Neil Charles Rabinowitz , Guillaume Desjardins , Andrei-Alexandru Rusu , Koray Kavukcuoglu , Raia Thais Hadsell , Razvan Pascanu , James Kirkpatrick , Hubert Josef Soyer
Abstract: Methods and systems for performing a sequence of machine learning tasks. One system includes a sequence of deep neural networks (DNNs), including: a first DNN corresponding to a first machine learning task, wherein the first DNN comprises a first plurality of indexed layers, and each layer in the first plurality of indexed layers is configured to receive a respective layer input and process the layer input to generate a respective layer output; and one or more subsequent DNNs corresponding to one or more respective machine learning tasks, wherein each subsequent DNN comprises a respective plurality of indexed layers, and each layer in a respective plurality of indexed layers with index greater than one receives input from a preceding layer of the respective subsequent DNN, and one or more preceding layers of respective preceding DNNs, wherein a preceding layer is a layer whose index is one less than the current index.
-
22.
公开(公告)号:US20230153617A1
公开(公告)日:2023-05-18
申请号:US18149771
申请日:2023-01-04
Applicant: DeepMind Technologies Limited
Inventor: Hubert Josef Soyer , Lasse Espeholt , Karen Simonyan , Yotam Doron , Vlad Firoiu , Volodymyr Mnih , Koray Kavukcuoglu , Remi Munos , Thomas Ward , Timothy James Alexander Harley , Iain Robert Dunning
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.
-
公开(公告)号:US11537887B2
公开(公告)日:2022-12-27
申请号:US16866753
申请日:2020-05-05
Applicant: DeepMind Technologies Limited
Inventor: Simon Osindero , Koray Kavukcuoglu , Alexander Vezhnevets
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a system configured to select actions to be performed by an agent that interacts with an environment. The system comprises a manager neural network subsystem and a worker neural network subsystem. The manager subsystem is configured to, at each of the multiple time steps, generate a final goal vector for the time step. The worker subsystem is configured to, at each of multiple time steps, use the final goal vector generated by the manager subsystem to generate a respective action score for each action in a predetermined set of actions.
-
公开(公告)号:US20210374538A1
公开(公告)日:2021-12-02
申请号:US17359427
申请日:2021-06-25
Applicant: DeepMind Technologies Limited
Inventor: Volodymyr Mnih , Koray Kavukcuoglu
Abstract: We describe a method of reinforcement learning for a subject system having multiple states and actions to move from one state to the next. Training data is generated by operating on the system with a succession of actions and used to train a second neural network. Target values for training the second neural network are derived from a first neural network which is generated by copying weights of the second neural network at intervals.
-
公开(公告)号:US11074481B2
公开(公告)日:2021-07-27
申请号:US16745757
申请日:2020-01-17
Applicant: DeepMind Technologies Limited
Inventor: Fabio Viola , Piotr Wojciech Mirowski , Andrea Banino , Razvan Pascanu , Hubert Josef Soyer , Andrew James Ballard , Sudarshan Kumaran , Raia Thais Hadsell , Laurent Sifre , Rostislav Goroshin , Koray Kavukcuoglu , Misha Man Ray Denil
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters.
-
公开(公告)号:US20210166127A1
公开(公告)日:2021-06-03
申请号:US17170316
申请日:2021-02-08
Applicant: DeepMind Technologies Limited
Inventor: Volodymyr Mnih , Adrià Puigdomènech Badia , Alexander Benjamin Graves , Timothy James Alexander Harley , David Silver , Koray Kavukcuoglu
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.
-
27.
公开(公告)号:US20210034970A1
公开(公告)日:2021-02-04
申请号:US16767049
申请日:2019-02-05
Applicant: DeepMind Technologies Limited
Inventor: Hubert Josef Soyer , Lasse Espeholt , Karen Simonyan , Yotam Doron , Vlad Firoiu , Volodymyr Mnih , Koray Kavukcuoglu , Remi Munos , Thomas Ward , Timothy James Alexander Harley , Iain Robert Dunning
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.
-
公开(公告)号:US20200265313A1
公开(公告)日:2020-08-20
申请号:US16866753
申请日:2020-05-05
Applicant: DeepMind Technologies Limited
Inventor: Simon Osindero , Koray Kavukcuoglu , Alexander Vezhnevets
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a system configured to select actions to be performed by an agent that interacts with an environment. The system comprises a manager neural network subsystem and a worker neural network subsystem. The manager subsystem is configured to, at each of the multiple time steps, generate a final goal vector for the time step. The worker subsystem is configured to, at each of multiple time steps, use the final goal vector generated by the manager subsystem to generate a respective action score for each action in a predetermined set of actions.
-
公开(公告)号:US10748029B2
公开(公告)日:2020-08-18
申请号:US16041567
申请日:2018-07-20
Applicant: DeepMind Technologies Limited
Inventor: Maxwell Elliot Jaderberg , Karen Simonyan , Andrew Zisserman , Koray Kavukcuoglu
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing inputs using an image processing neural network system that includes a spatial transformer module. One of the methods includes receiving an input feature map derived from the one or more input images, and applying a spatial transformation to the input feature map to generate a transformed feature map, comprising: processing the input feature map to generate spatial transformation parameters for the spatial transformation, and sampling from the input feature map in accordance with the spatial transformation parameters to generate the transformed feature map.
-
公开(公告)号:US20190340509A1
公开(公告)日:2019-11-07
申请号:US16511571
申请日:2019-07-15
Applicant: DeepMind Technologies Limited
Inventor: Simon Osindero , Koray Kavukcuoglu , Alexander Vezhnevets
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a system configured to select actions to be performed by an agent that interacts with an environment. The system comprises a manager neural network subsystem and a worker neural network subsystem. The manager subsystem is configured to, at each of the multiple time steps, generate a final goal vector for the time step. The worker subsystem is configured to, at each of multiple time steps, use the final goal vector generated by the manager subsystem to generate a respective action score for each action in a predetermined set of actions.
-
-
-
-
-
-
-
-
-