-
公开(公告)号:US10572776B2
公开(公告)日:2020-02-25
申请号:US16403343
申请日:2019-05-03
Applicant: DeepMind Technologies Limited
Inventor: Fabio Viola , Piotr Wojciech Mirowski , Andrea Banino , Razvan Pascanu , Hubert Josef Soyer , Andrew James Ballard , Sudarshan Kumaran , Raia Thais Hadsell , Laurent Sifre , Rostislav Goroshin , Koray Kavukcuoglu , Misha Man Ray Denil
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters.
-
公开(公告)号:US10445641B2
公开(公告)日:2019-10-15
申请号:US15016173
申请日:2016-02-04
Applicant: DeepMind Technologies Limited
Inventor: Praveen Deepak Srinivasan , Rory Fearon , Cagdas Alcicek , Arun Sarath Nair , Samuel Blackwell , Vedavyas Panneershelvam , Alessandro De Maria , Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Mustafa Suleyman
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributed training of reinforcement learning systems. One of the methods includes receiving, by a learner, current values of the parameters of the Q network from a parameter server, wherein each learner maintains a respective learner Q network replica and a respective target Q network replica; updating, by the learner, the parameters of the learner Q network replica maintained by the learner using the current values; selecting, by the learner, an experience tuple from a respective replay memory; computing, by the learner, a gradient from the experience tuple using the learner Q network replica maintained by the learner and the target Q network replica maintained by the learner; and providing, by the learner, the computed gradient to the parameter server.
-
公开(公告)号:US20190258938A1
公开(公告)日:2019-08-22
申请号:US16403385
申请日:2019-05-03
Applicant: DeepMind Technologies Limited
Inventor: Volodymyr Mnih , Wojciech Czarnecki , Maxwell Elliot Jaderberg , Tom Schaul , David Silver , Koray Kavukcuoglu
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.
-
公开(公告)号:US10346741B2
公开(公告)日:2019-07-09
申请号:US15977923
申请日:2018-05-11
Applicant: DeepMind Technologies Limited
Inventor: Volodymyr Mnih , Adrià Puigdomènech Badia , Alexander Benjamin Graves , Timothy James Alexander Harley , David Silver , Koray Kavukcuoglu
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.
-
公开(公告)号:US20180330185A1
公开(公告)日:2018-11-15
申请号:US16041567
申请日:2018-07-20
Applicant: DeepMind Technologies Limited
Inventor: Maxwell Elliot Jaderberg , Karen Simonyan , Andrew Zisserman , Koray Kavukcuoglu
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing inputs using an image processing neural network system that includes a spatial transformer module. One of the methods includes receiving an input feature map derived from the one or more input images, and applying a spatial transformation to the input feature map to generate a transformed feature map, comprising: processing the input feature map to generate spatial transformation parameters for the spatial transformation, and sampling from the input feature map in accordance with the spatial transformation parameters to generate the transformed feature map.
-
公开(公告)号:US20240249146A1
公开(公告)日:2024-07-25
申请号:US18415376
申请日:2024-01-17
Applicant: DeepMind Technologies Limited
Inventor: Chrisantha Thomas Fernando , Karen Simonyan , Koray Kavukcuoglu , Hanxiao Liu , Oriol Vinyals
IPC: G06N3/086 , G06F16/901 , G06F17/15 , G06N3/045
CPC classification number: G06N3/086 , G06F16/9024 , G06N3/045 , G06F17/15
Abstract: A computer-implemented method for automatically determining a neural network architecture represents a neural network architecture as a data structure defining a hierarchical set of directed acyclic graphs in multiple levels. Each graph has an input, an output, and a plurality of nodes between the input and the output. At each level, a corresponding set of the nodes are connected pairwise by directed edges which indicate operations performed on outputs of one node to generate an input to another node. Each level is associated with a corresponding set of operations. At a lowest level, the operations associated with each edge are selected from a set of primitive operations. The method includes repeatedly generating new sample neural network architectures, and evaluating their fitness. The modification is performed by selecting a level, selecting two nodes at that level, and modifying, removing or adding an edge between those nodes according to operations associated with lower levels of the hierarchy.
-
公开(公告)号:US11941088B1
公开(公告)日:2024-03-26
申请号:US17737544
申请日:2022-05-05
Applicant: DeepMind Technologies Limited
Inventor: Volodymyr Mnih , Koray Kavukcuoglu
IPC: G06V10/44 , G06F18/2431 , G06V20/80 , G06V30/194 , G06V30/413
CPC classification number: G06F18/2431 , G06V10/44 , G06V20/80 , G06V30/194 , G06V30/413
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using recurrent attention. One of the methods includes determining a location in the first image; extracting a glimpse from the first image using the location; generating a glimpse representation of the extracted glimpse; processing the glimpse representation using a recurrent neural network to update a current internal state of the recurrent neural network to generate a new internal state; processing the new internal state to select a location in a next image in the image sequence after the first image; and processing the new internal state to select an action from a predetermined set of possible actions.
-
公开(公告)号:US11783182B2
公开(公告)日:2023-10-10
申请号:US17170316
申请日:2021-02-08
Applicant: DeepMind Technologies Limited
Inventor: Volodymyr Mnih , Adrià Puigdomènech Badia , Alexander Benjamin Graves , Timothy James Alexander Harley , David Silver , Koray Kavukcuoglu
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.
-
公开(公告)号:US20230090824A1
公开(公告)日:2023-03-23
申请号:US18072175
申请日:2022-11-30
Applicant: DeepMind Technologies Limited
Inventor: Simon Osindero , Koray Kavukcuoglu , Alexander Vezhnevets
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a system configured to select actions to be performed by an agent that interacts with an environment. The system comprises a manager neural network subsystem and a worker neural network subsystem. The manager subsystem is configured to, at each of the multiple time steps, generate a final goal vector for the time step. The worker subsystem is configured to, at each of multiple time steps, use the final goal vector generated by the manager subsystem to generate a respective action score for each action in a predetermined set of actions.
-
公开(公告)号:US20220261647A1
公开(公告)日:2022-08-18
申请号:US17733594
申请日:2022-04-29
Applicant: DeepMind Technologies Limited
Inventor: Volodymyr Mnih , Adrià Puigdomènech Badia , Alexander Benjamin Graves , Timothy James Alexander Harley , David Silver , Koray Kavukcuoglu
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.
-
-
-
-
-
-
-
-
-