-
公开(公告)号:US20220237488A1
公开(公告)日:2022-07-28
申请号:US17613687
申请日:2020-05-22
Applicant: DeepMind Technologies Limited
Inventor: Markus Wulfmeier , Abbas Abdolmaleki , Roland Hafner , Jost Tobias Springenberg , Nicolas Manfred Otto Heess , Martin Riedmiller
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent. One of the methods includes obtaining an observation characterizing a current state of the environment and data identifying a task currently being performed by the agent; processing the observation and the data identifying the task using a high-level controller to generate a high-level probability distribution that assigns a respective probability to each of a plurality of low-level controllers; processing the observation using each of the plurality of low-level controllers to generate, for each of the plurality of low-level controllers, a respective low-level probability distribution; generating a combined probability distribution; and selecting, using the combined probability distribution, an action from the space of possible actions to be performed by the agent in response to the observation.
-
公开(公告)号:US20210049467A1
公开(公告)日:2021-02-18
申请号:US17046963
申请日:2019-04-12
Applicant: DeepMind Technologies Limited
Inventor: Martin Riedmiller , Raia Thais Hadsell , Peter William Battaglia , Joshua Merel , Jost Tobias Springenberg , Alvaro Sanchez , Nicolas Manfred Otto Heess
IPC: G06N3/08
Abstract: A graph neural network system implementing a learnable physics engine for understanding and controlling a physical system. The physical system is considered to be composed of bodies coupled by joints and is represented by static and dynamic graphs. A graph processing neural network processes an input graph e.g. the static and dynamic graphs, to provide an output graph, e.g. a predicted dynamic graph. The graph processing neural network is differentiable and may be used for control and/or reinforcement learning. The trained graph neural network system can be applied to physical systems with similar but new graph structures (zero-shot learning).
-
公开(公告)号:US20200151562A1
公开(公告)日:2020-05-14
申请号:US16624245
申请日:2018-06-28
Applicant: DEEPMIND TECHNOLOGIES LIMITED
Inventor: Olivier Pietquin , Martin Riedmiller , Wang Fumin , Bilal Piot , Mel Vecerik , Todd Andrew Hester , Thomas Rothörl , Thomas Lampe , Nicolas Manfred Otto Heess , Jonathan Karl Scholz
Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.
-
公开(公告)号:US20240312657A1
公开(公告)日:2024-09-19
申请号:US18572914
申请日:2022-07-08
Applicant: DeepMind Technologies Limited
Inventor: Jonas Degrave , Federico Alberto Alfredo Felici , Jonas Buchli , Michael Peter Neunert , Brendan Daniel Tracey , Francesco Carpanese , Timo Victor Ewalds , Roland Hafner , Martin Riedmiller
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating control signals for controlling a magnetic field for confining plasma in a chamber of a magnetic confinement device. One of the methods includes, for each of a plurality of time steps, obtaining an observation characterizing a current state of the plasma in the chamber of the magnetic confinement device, processing an input including the observation using a plasma confinement neural network to generate a magnetic control output that characterizes control signals for controlling the magnetic field of the magnetic confinement device, and generating the control signals for controlling the magnetic field of the magnetic confinement device based on the magnetic control output.
-
公开(公告)号:US20240311617A1
公开(公告)日:2024-09-19
申请号:US18443285
申请日:2024-02-15
Applicant: DeepMind Technologies Limited
Inventor: Norman Di Palo , Arunkumar Byravan , Nicolas Manfred Otto Heess , Martin Riedmiller , Leonard Hasenclever , Markus Wulfmeier
IPC: G06N3/0455
CPC classification number: G06N3/0455
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents using a language model neural network and a vision-language model (VLM) neural network.
-
公开(公告)号:US20240220795A1
公开(公告)日:2024-07-04
申请号:US18401226
申请日:2023-12-29
Applicant: DeepMind Technologies Limited
Inventor: Jingwei Zhang , Arunkumar Byravan , Jost Tobias Springenberg , Martin Riedmiller , Nicolas Manfred Otto Heess , Leonard Hasenclever , Abbas Abdolmaleki , Dushyant Rao
IPC: G06N3/08
CPC classification number: G06N3/08
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents using jumpy trajectory decoder neural networks.
-
公开(公告)号:US11893480B1
公开(公告)日:2024-02-06
申请号:US16289531
申请日:2019-02-28
Applicant: DeepMind Technologies Limited
Inventor: Martin Riedmiller , Roland Hafner
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning with scheduled auxiliary tasks. In one aspect, a method includes maintaining data specifying parameter values for a primary policy neural network and one or more auxiliary neural networks; at each of a plurality of selection time steps during a training episode comprising a plurality of time steps: receiving an observation, selecting a current task for the selection time step using a task scheduling policy, processing an input comprising the observation using the policy neural network corresponding to the selected current task to select an action to be performed by the agent in response to the observation, and causing the agent to perform the selected action.
-
公开(公告)号:US20200285909A1
公开(公告)日:2020-09-10
申请号:US16882373
申请日:2020-05-22
Applicant: DeepMind Technologies Limited
Inventor: Martin Riedmiller , Roland Hafner , Mel Vecerik , Timothy Paul Lillicrap , Thomas Lampe , Ivaylo Popov , Gabriel Barth-Maron , Nicolas Manfred Otto Heess
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-efficient reinforcement learning. One of the systems is a system for training an actor neural network used to select actions to be performed by an agent that interacts with an environment by receiving observations characterizing states of the environment and, in response to each observation, performing an action selected from a continuous space of possible actions, wherein the actor neural network maps observations to next actions in accordance with values of parameters of the actor neural network, and wherein the system comprises: a plurality of workers, wherein each worker is configured to operate independently of each other worker, wherein each worker is associated with a respective agent replica that interacts with a respective replica of the environment during the training of the actor neural network.
-
公开(公告)号:US10664725B2
公开(公告)日:2020-05-26
申请号:US16528260
申请日:2019-07-31
Applicant: DeepMind Technologies Limited
Inventor: Martin Riedmiller , Roland Hafner , Mel Vecerik , Timothy Paul Lillicrap , Thomas Lampe , Ivaylo Popov , Gabriel Barth-Maron , Nicolas Manfred Otto Heess
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-efficient reinforcement learning. One of the systems is a system for training an actor neural network used to select actions to be performed by an agent that interacts with an environment by receiving observations characterizing states of the environment and, in response to each observation, performing an action selected from a continuous space of possible actions, wherein the actor neural network maps observations to next actions in accordance with values of parameters of the actor neural network, and wherein the system comprises: a plurality of workers, wherein each worker is configured to operate independently of each other worker, wherein each worker is associated with a respective agent replica that interacts with a respective replica of the environment during the training of the actor neural network.
-
-
-
-
-
-
-
-