Patent search ap:("DeepMind Technologies Limited") AND inv:"Martin Riedmiller" Page 2

11.

发明申请
HIERARCHICAL POLICIES FOR MULTITASK TRANSFER 有权

公开(公告)号：US20220237488A1

公开(公告)日：2022-07-28

申请号：US17613687

申请日：2020-05-22

Applicant: DeepMind Technologies Limited

Inventor： Markus Wulfmeier , Abbas Abdolmaleki , Roland Hafner , Jost Tobias Springenberg , Nicolas Manfred Otto Heess , Martin Riedmiller

IPC: G06N7/00 , G06N3/04 , G06N20/20

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent. One of the methods includes obtaining an observation characterizing a current state of the environment and data identifying a task currently being performed by the agent; processing the observation and the data identifying the task using a high-level controller to generate a high-level probability distribution that assigns a respective probability to each of a plurality of low-level controllers; processing the observation using each of the plurality of low-level controllers to generate, for each of the plurality of low-level controllers, a respective low-level probability distribution; generating a combined probability distribution; and selecting, using the combined probability distribution, an action from the space of possible actions to be performed by the agent in response to the observation.

12.

发明申请
GRAPH NEURAL NETWORKS REPRESENTING PHYSICAL SYSTEMS 有权

公开(公告)号：US20210049467A1

公开(公告)日：2021-02-18

申请号：US17046963

申请日：2019-04-12

Applicant: DeepMind Technologies Limited

Inventor： Martin Riedmiller , Raia Thais Hadsell , Peter William Battaglia , Joshua Merel , Jost Tobias Springenberg , Alvaro Sanchez , Nicolas Manfred Otto Heess

IPC: G06N3/08

Abstract: A graph neural network system implementing a learnable physics engine for understanding and controlling a physical system. The physical system is considered to be composed of bodies coupled by joints and is represented by static and dynamic graphs. A graph processing neural network processes an input graph e.g. the static and dynamic graphs, to provide an output graph, e.g. a predicted dynamic graph. The graph processing neural network is differentiable and may be used for control and/or reinforcement learning. The trained graph neural network system can be applied to physical systems with similar but new graph structures (zero-shot learning).

13.

发明申请
TRAINING ACTION SELECTION NEURAL NETWORKS USING APPRENTICESHIP 审中-公开

公开(公告)号：US20200151562A1

公开(公告)日：2020-05-14

申请号：US16624245

申请日：2018-06-28

Applicant: DEEPMIND TECHNOLOGIES LIMITED

Inventor： Olivier Pietquin , Martin Riedmiller , Wang Fumin , Bilal Piot , Mel Vecerik , Todd Andrew Hester , Thomas Rothörl , Thomas Lampe , Nicolas Manfred Otto Heess , Jonathan Karl Scholz

IPC: G06N3/08 , G06N3/04

Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.

14.

发明公开
CONTROLLING A MAGNETIC FIELD OF A MAGNETIC CONFINEMENT DEVICE USING A NEURAL NETWORK 审中-公开

公开(公告)号：US20240312657A1

公开(公告)日：2024-09-19

申请号：US18572914

申请日：2022-07-08

Applicant: DeepMind Technologies Limited

Inventor： Jonas Degrave , Federico Alberto Alfredo Felici , Jonas Buchli , Michael Peter Neunert , Brendan Daniel Tracey , Francesco Carpanese , Timo Victor Ewalds , Roland Hafner , Martin Riedmiller

IPC: G21D3/00 , G21B1/05

CPC classification number: G21D3/001 , G21B1/05

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating control signals for controlling a magnetic field for confining plasma in a chamber of a magnetic confinement device. One of the methods includes, for each of a plurality of time steps, obtaining an observation characterizing a current state of the plasma in the chamber of the magnetic confinement device, processing an input including the observation using a plasma confinement neural network to generate a magnetic control output that characterizes control signals for controlling the magnetic field of the magnetic confinement device, and generating the control signals for controlling the magnetic field of the magnetic confinement device based on the magnetic control output.

15.

发明公开
CONTROLLING AGENTS USING SUB-GOALS GENERATED BY LANGUAGE MODEL NEURAL NETWORKS 审中-公开

公开(公告)号：US20240311617A1

公开(公告)日：2024-09-19

申请号：US18443285

申请日：2024-02-15

Applicant: DeepMind Technologies Limited

Inventor： Norman Di Palo , Arunkumar Byravan , Nicolas Manfred Otto Heess , Martin Riedmiller , Leonard Hasenclever , Markus Wulfmeier

IPC: G06N3/0455

CPC classification number: G06N3/0455

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents using a language model neural network and a vision-language model (VLM) neural network.

16.

发明公开
PLANNING USING A JUMPY TRAJECTORY DECODER NEURAL NETWORK 审中-公开

公开(公告)号：US20240220795A1

公开(公告)日：2024-07-04

申请号：US18401226

申请日：2023-12-29

Applicant: DeepMind Technologies Limited

Inventor： Jingwei Zhang , Arunkumar Byravan , Jost Tobias Springenberg , Martin Riedmiller , Nicolas Manfred Otto Heess , Leonard Hasenclever , Abbas Abdolmaleki , Dushyant Rao

IPC: G06N3/08

CPC classification number: G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents using jumpy trajectory decoder neural networks.

17.

发明授权
Reinforcement learning with scheduled auxiliary control 有权

公开(公告)号：US11893480B1

公开(公告)日：2024-02-06

申请号：US16289531

申请日：2019-02-28

Applicant: DeepMind Technologies Limited

Inventor： Martin Riedmiller , Roland Hafner

IPC: G06N3/08 , G06N3/04 , G06N7/01

CPC classification number: G06N3/08 , G06N3/04 , G06N7/01

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning with scheduled auxiliary tasks. In one aspect, a method includes maintaining data specifying parameter values for a primary policy neural network and one or more auxiliary neural networks; at each of a plurality of selection time steps during a training episode comprising a plurality of time steps: receiving an observation, selecting a current task for the selection time step using a task scheduling policy, processing an input comprising the observation using the policy neural network corresponding to the selected current task to select an action to be performed by the agent in response to the observation, and causing the agent to perform the selected action.

18.

发明申请
DATA-EFFICIENT REINFORCEMENT LEARNING FOR CONTINUOUS CONTROL TASKS 审中-公开

公开(公告)号：US20200285909A1

公开(公告)日：2020-09-10

申请号：US16882373

申请日：2020-05-22

Applicant: DeepMind Technologies Limited

Inventor： Martin Riedmiller , Roland Hafner , Mel Vecerik , Timothy Paul Lillicrap , Thomas Lampe , Ivaylo Popov , Gabriel Barth-Maron , Nicolas Manfred Otto Heess

IPC: G06K9/62 , G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-efficient reinforcement learning. One of the systems is a system for training an actor neural network used to select actions to be performed by an agent that interacts with an environment by receiving observations characterizing states of the environment and, in response to each observation, performing an action selected from a continuous space of possible actions, wherein the actor neural network maps observations to next actions in accordance with values of parameters of the actor neural network, and wherein the system comprises: a plurality of workers, wherein each worker is configured to operate independently of each other worker, wherein each worker is associated with a respective agent replica that interacts with a respective replica of the environment during the training of the actor neural network.

19.

发明授权
Data-efficient reinforcement learning for continuous control tasks 有权

公开(公告)号：US10664725B2

公开(公告)日：2020-05-26

申请号：US16528260

申请日：2019-07-31

Applicant: DeepMind Technologies Limited

Inventor： Martin Riedmiller , Roland Hafner , Mel Vecerik , Timothy Paul Lillicrap , Thomas Lampe , Ivaylo Popov , Gabriel Barth-Maron , Nicolas Manfred Otto Heess

IPC: G06N3/08 , G06K9/62 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-efficient reinforcement learning. One of the systems is a system for training an actor neural network used to select actions to be performed by an agent that interacts with an environment by receiving observations characterizing states of the environment and, in response to each observation, performing an action selected from a continuous space of possible actions, wherein the actor neural network maps observations to next actions in accordance with values of parameters of the actor neural network, and wherein the system comprises: a plurality of workers, wherein each worker is configured to operate independently of each other worker, wherein each worker is associated with a respective agent replica that interacts with a respective replica of the environment during the training of the actor neural network.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification