-
公开(公告)号:US10860927B2
公开(公告)日:2020-12-08
申请号:US16586360
申请日:2019-09-27
Applicant: DeepMind Technologies Limited
Inventor: Mehdi Mirza Mohammadi , Arthur Clement Guez , Karol Gregor , Rishabh Kabra
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent interacting with an environment. One of the methods includes obtaining a representation of an observation; processing the representation using a convolutional long short-term memory (LSTM) neural network comprising a plurality of convolutional LSTM neural network layers; processing an action selection input comprising the final LSTM hidden state output for the time step using an action selection neural network that is configured to receive the action selection input and to process the action selection input to generate an action selection output that defines an action to be performed by the agent at the time step; selecting, from the action selection output, the action to be performed by the agent at the time step in accordance with an action selection policy; and causing the agent to perform the selected action.
-
公开(公告)号:US20220366245A1
公开(公告)日:2022-11-17
申请号:US17763901
申请日:2020-09-23
Applicant: DeepMind Technologies Limited
Inventor: Arthur Clement Guez , Fabio Viola , Theophane Guillaume Weber , Lars Buesing , Nicolas Manfred Otto Heess
IPC: G06N3/08
Abstract: A reinforcement learning method and system that selects actions to be performed by a reinforcement learning agent interacting with an environment. A causal model is implemented by a hindsight model neural network and trained using hindsight i.e. using future environment state trajectories. As the method and system does not have access to this future information when selecting an action, the hindsight model neural network is used to train a model neural network which is conditioned on data from current observations, which learns to predict an output of the hindsight model neural network.
-
公开(公告)号:US20210073594A1
公开(公告)日:2021-03-11
申请号:US17019919
申请日:2020-09-14
Applicant: DeepMind Technologies Limited
Inventor: Daniel Pieter Wierstra , Yujia Li , Razvan Pascanu , Peter William Battaglia , Theophane Guillaume Weber , Lars Buesing , David Paul Reichert , Arthur Clement Guez , Danilo Jimenez Rezende , Adrià Puigdomènech Badia , Oriol Vinyals , Nicolas Manfred Otto Heess , Sebastien Henri Andre Racaniere
Abstract: A neural network system is proposed. The neural network can be trained by model-based reinforcement learning to select actions to be performed by an agent interacting with an environment, to perform a task in an attempt to achieve a specified result. The system may comprise at least one imagination core which receives a current observation characterizing a current state of the environment, and optionally historical observations, and which includes a model of the environment. The imagination core may be configured to output trajectory data in response to the current observation, and/or historical observations. The trajectory data comprising a sequence of future features of the environment imagined by the imagination core. The system may also include a rollout encoder to encode the features, and an output stage to receive data derived from the rollout embedding and to output action policy data for identifying an action based on the current observation.
-
公开(公告)号:US10776670B2
公开(公告)日:2020-09-15
申请号:US16689058
申请日:2019-11-19
Applicant: DeepMind Technologies Limited
Inventor: Daniel Pieter Wierstra , Yujia Li , Razvan Pascanu , Peter William Battaglia , Theophane Guillaume Weber , Lars Buesing , David Paul Reichert , Arthur Clement Guez , Danilo Jimenez Rezende , Adrià Puigdomènech Badia , Oriol Vinyals , Nicolas Manfred Otto Heess , Sebastien Henri Andre Racaniere
Abstract: A neural network system is proposed. The neural network can be trained by model-based reinforcement learning to select actions to be performed by an agent interacting with an environment, to perform a task in an attempt to achieve a specified result. The system may comprise at least one imagination core which receives a current observation characterizing a current state of the environment, and optionally historical observations, and which includes a model of the environment. The imagination core may be configured to output trajectory data in response to the current observation, and/or historical observations. The trajectory data comprising a sequence of future features of the environment imagined by the imagination core. The system may also include a rollout encoder to encode the features, and an output stage to receive data derived from the rollout embedding and to output action policy data for identifying an action based on the current observation.
-
公开(公告)号:US20200090006A1
公开(公告)日:2020-03-19
申请号:US16689058
申请日:2019-11-19
Applicant: DeepMind Technologies Limited
Inventor: Daniel Pieter Wierstra , Yujia Li , Razvan Pascanu , Peter William Battaglia , Theophane Guillaume Weber , Lars Buesing , David Paul Reichert , Arthur Clement Guez , Danilo Jimenez Rezende , Adrià Puigdomènech Badia , Oriol Vinyals , Nicolas Manfred Otto Heess , Sebastien Henri Andre Racaniere
Abstract: A neural network system is proposed. The neural network can be trained by model-based reinforcement learning to select actions to be performed by an agent interacting with an environment, to perform a task in an attempt to achieve a specified result. The system may comprise at least one imagination core which receives a current observation characterizing a current state of the environment, and optionally historical observations, and which includes a model of the environment. The imagination core may be configured to output trajectory data in response to the current observation, and/or historical observations. The trajectory data comprising a sequence of future features of the environment imagined by the imagination core. The system may also include a rollout encoder to encode the features, and an output stage to receive data derived from the rollout embedding and to output action policy data for identifying an action based on the current observation.
-
公开(公告)号:US11328183B2
公开(公告)日:2022-05-10
申请号:US17019919
申请日:2020-09-14
Applicant: DeepMind Technologies Limited
Inventor: Daniel Pieter Wierstra , Yujia Li , Razvan Pascanu , Peter William Battaglia , Theophane Guillaume Weber , Lars Buesing , David Paul Reichert , Arthur Clement Guez , Danilo Jimenez Rezende , Adrià Puigdomènech Badia , Oriol Vinyals , Nicolas Manfred Otto Heess , Sebastien Henri Andre Racaniere
Abstract: A neural network system is proposed. The neural network can be trained by model-based reinforcement learning to select actions to be performed by an agent interacting with an environment, to perform a task in an attempt to achieve a specified result. The system may comprise at least one imagination core which receives a current observation characterizing a current state of the environment, and optionally historical observations, and which includes a model of the environment. The imagination core may be configured to output trajectory data in response to the current observation, and/or historical observations. The trajectory data comprising a sequence of future features of the environment imagined by the imagination core. The system may also include a rollout encoder to encode the features, and an output stage to receive data derived from the rollout embedding and to output action policy data for identifying an action based on the current observation.
-
公开(公告)号:US20200104709A1
公开(公告)日:2020-04-02
申请号:US16586360
申请日:2019-09-27
Applicant: DeepMind Technologies Limited
Inventor: Mehdi Mirza Mohammadi , Arthur Clement Guez , Karol Gregor , Rishabh Kabra
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent interacting with an environment. One of the methods includes obtaining a representation of an observation; processing the representation using a convolutional long short-term memory (LSTM) neural network comprising a plurality of convolutional LSTM neural network layers; processing an action selection input comprising the final LSTM hidden state output for the time step using an action selection neural network that is configured to receive the action selection input and to process the action selection input to generate an action selection output that defines an action to be performed by the agent at the time step; selecting, from the action selection output, the action to be performed by the agent at the time step in accordance with an action selection policy; and causing the agent to perform the selected action.
-
-
-
-
-
-