-
公开(公告)号:US20220237488A1
公开(公告)日:2022-07-28
申请号:US17613687
申请日:2020-05-22
Applicant: DeepMind Technologies Limited
Inventor: Markus Wulfmeier , Abbas Abdolmaleki , Roland Hafner , Jost Tobias Springenberg , Nicolas Manfred Otto Heess , Martin Riedmiller
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent. One of the methods includes obtaining an observation characterizing a current state of the environment and data identifying a task currently being performed by the agent; processing the observation and the data identifying the task using a high-level controller to generate a high-level probability distribution that assigns a respective probability to each of a plurality of low-level controllers; processing the observation using each of the plurality of low-level controllers to generate, for each of the plurality of low-level controllers, a respective low-level probability distribution; generating a combined probability distribution; and selecting, using the combined probability distribution, an action from the space of possible actions to be performed by the agent in response to the observation.
-
公开(公告)号:US11328183B2
公开(公告)日:2022-05-10
申请号:US17019919
申请日:2020-09-14
Applicant: DeepMind Technologies Limited
Inventor: Daniel Pieter Wierstra , Yujia Li , Razvan Pascanu , Peter William Battaglia , Theophane Guillaume Weber , Lars Buesing , David Paul Reichert , Arthur Clement Guez , Danilo Jimenez Rezende , Adrià Puigdomènech Badia , Oriol Vinyals , Nicolas Manfred Otto Heess , Sebastien Henri Andre Racaniere
Abstract: A neural network system is proposed. The neural network can be trained by model-based reinforcement learning to select actions to be performed by an agent interacting with an environment, to perform a task in an attempt to achieve a specified result. The system may comprise at least one imagination core which receives a current observation characterizing a current state of the environment, and optionally historical observations, and which includes a model of the environment. The imagination core may be configured to output trajectory data in response to the current observation, and/or historical observations. The trajectory data comprising a sequence of future features of the environment imagined by the imagination core. The system may also include a rollout encoder to encode the features, and an output stage to receive data derived from the rollout embedding and to output action policy data for identifying an action based on the current observation.
-
公开(公告)号:US20210049467A1
公开(公告)日:2021-02-18
申请号:US17046963
申请日:2019-04-12
Applicant: DeepMind Technologies Limited
Inventor: Martin Riedmiller , Raia Thais Hadsell , Peter William Battaglia , Joshua Merel , Jost Tobias Springenberg , Alvaro Sanchez , Nicolas Manfred Otto Heess
IPC: G06N3/08
Abstract: A graph neural network system implementing a learnable physics engine for understanding and controlling a physical system. The physical system is considered to be composed of bodies coupled by joints and is represented by static and dynamic graphs. A graph processing neural network processes an input graph e.g. the static and dynamic graphs, to provide an output graph, e.g. a predicted dynamic graph. The graph processing neural network is differentiable and may be used for control and/or reinforcement learning. The trained graph neural network system can be applied to physical systems with similar but new graph structures (zero-shot learning).
-
公开(公告)号:US20200151562A1
公开(公告)日:2020-05-14
申请号:US16624245
申请日:2018-06-28
Applicant: DEEPMIND TECHNOLOGIES LIMITED
Inventor: Olivier Pietquin , Martin Riedmiller , Wang Fumin , Bilal Piot , Mel Vecerik , Todd Andrew Hester , Thomas Rothörl , Thomas Lampe , Nicolas Manfred Otto Heess , Jonathan Karl Scholz
Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.
-
公开(公告)号:US20200090048A1
公开(公告)日:2020-03-19
申请号:US16689020
申请日:2019-11-19
Applicant: DeepMind Technologies Limited
Inventor: Razvan Pascanu , Raia Thais Hadsell , Victor Constant Bapst , Wojciech Czarnecki , James Kirkpatrick , Yee Whye Teh , Nicolas Manfred Otto Heess
Abstract: A method is proposed for training a multitask computer system, such as a multitask neural network system. The system comprises a set of trainable workers and a shared module. The trainable workers and shared module are trained on a plurality of different tasks, such that each worker learns to perform a corresponding one of the tasks according to a respective task policy, and said shared policy network learns a multitask policy which represents common behavior for the tasks. The coordinated training is performed by optimizing an objective function comprising, for each task: a reward term indicative of an expected reward earned by a worker in performing the corresponding task according to the task policy; and at least one entropy term which regularizes the distribution of the task policy towards the distribution of the multitask policy.
-
公开(公告)号:US20190258918A1
公开(公告)日:2019-08-22
申请号:US16402687
申请日:2019-05-03
Applicant: DeepMind Technologies Limited
Inventor: Ziyu Wang , Nicolas Manfred Otto Heess , Victor Constant Bapst
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.
-
-
-
-
-