Patent search ap:("DeepMind Technologies Limited") AND inv:"Fabio Viola" Page 1

1.

发明申请
TRAINING ACTION SELECTION NEURAL NETWORKS USING HINDSIGHT MODELLING 有权

公开(公告)号：US20220366245A1

公开(公告)日：2022-11-17

申请号：US17763901

申请日：2020-09-23

Applicant: DeepMind Technologies Limited

Inventor： Arthur Clement Guez , Fabio Viola , Theophane Guillaume Weber , Lars Buesing , Nicolas Manfred Otto Heess

IPC: G06N3/08

Abstract: A reinforcement learning method and system that selects actions to be performed by a reinforcement learning agent interacting with an environment. A causal model is implemented by a hindsight model neural network and trained using hindsight i.e. using future environment state trajectories. As the method and system does not have access to this future information when selecting an action, the hindsight model neural network is used to train a model neural network which is conditioned on data from current observations, which learns to predict an output of the hindsight model neural network.

2.

发明授权
Environment navigation using reinforcement learning 有权

公开(公告)号：US10572776B2

公开(公告)日：2020-02-25

申请号：US16403343

申请日：2019-05-03

Applicant: DeepMind Technologies Limited

Inventor： Fabio Viola , Piotr Wojciech Mirowski , Andrea Banino , Razvan Pascanu , Hubert Josef Soyer , Andrew James Ballard , Sudarshan Kumaran , Raia Thais Hadsell , Laurent Sifre , Rostislav Goroshin , Koray Kavukcuoglu , Misha Man Ray Denil

IPC: G06K9/00 , G06K9/62 , G06N3/04 , G06N3/08 , G06N3/00 , G06T7/50 , G06T7/70

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters.

3.

发明申请
ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING 审中-公开

公开(公告)号：US20200151515A1

公开(公告)日：2020-05-14

申请号：US16745757

申请日：2020-01-17

Applicant: DeepMind Technologies Limited

Inventor： Fabio Viola , Piotr Wojciech Mirowski , Andrea Banino , Razvan Pascanu , Hubert Josef Soyer , Andrew James Ballard , Sudarshan Kumaran , Raia Thais Hadsell , Laurent Sifre , Rostislav Goroshin , Koray Kavukcuoglu , Misha Man Ray Denil

IPC: G06K9/62 , G06N3/00 , G06N3/08 , G06N3/04 , G06K9/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters.

4.

发明授权
Environment navigation using reinforcement learning 有权

公开(公告)号：US11074481B2

公开(公告)日：2021-07-27

申请号：US16745757

申请日：2020-01-17

Applicant: DeepMind Technologies Limited

Inventor： Fabio Viola , Piotr Wojciech Mirowski , Andrea Banino , Razvan Pascanu , Hubert Josef Soyer , Andrew James Ballard , Sudarshan Kumaran , Raia Thais Hadsell , Laurent Sifre , Rostislav Goroshin , Koray Kavukcuoglu , Misha Man Ray Denil

IPC: G06K9/00 , G06K9/62 , G06N3/04 , G06N3/08 , G06N3/00 , G06T7/50 , G06T7/70

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters.

5.

发明申请
ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING 审中-公开

公开(公告)号：US20190266449A1

公开(公告)日：2019-08-29

申请号：US16403343

申请日：2019-05-03

Applicant: DeepMind Technologies Limited

Inventor： Fabio Viola , Piotr Wojciech Mirowski , Andrea Banino , Razvan Pascanu , Hubert Josef Soyer , Andrew James Ballard , Sudarshan Kumaran , Raia Thais Hadsell , Laurent Sifre , Rostislav Goroshin , Koray Kavukcuoglu , Misha Man Ray Denil

IPC: G06K9/62 , G06N3/08 , G06N3/04 , G06K9/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification