Patent search ap:("DEEPMIND TECHNOLOGIES LIMITED") AND inv:"Nicolas Manfred Otto Heess" Page 3

21.

发明授权
Continuous control with deep reinforcement learning 有权

公开(公告)号：US10776692B2

公开(公告)日：2020-09-15

申请号：US15217758

申请日：2016-07-22

Applicant: DeepMind Technologies Limited

Inventor： Timothy Paul Lillicrap , Jonathan James Hunt , Alexander Pritzel , Nicolas Manfred Otto Heess , Tom Erez , Yuval Tassa , David Silver , Daniel Pieter Wierstra

IPC: G06N3/08 , G06N3/00 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an actor neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a minibatch of experience tuples; and updating current values of the parameters of the actor neural network, comprising: for each experience tuple in the minibatch: processing the training observation and the training action in the experience tuple using a critic neural network to determine a neural network output for the experience tuple, and determining a target neural network output for the experience tuple; updating current values of the parameters of the critic neural network using errors between the target neural network outputs and the neural network outputs; and updating the current values of the parameters of the actor neural network using the critic neural network.

22.

发明申请
DATA EFFICIENT IMITATION OF DIVERSE BEHAVIORS 审中-公开

公开(公告)号：US20200090042A1

公开(公告)日：2020-03-19

申请号：US16688934

申请日：2019-11-19

Applicant: DeepMind Technologies Limited

Inventor： Gregory Duncan Wayne , Joshua Merel , Ziyu Wang , Nicolas Manfred Otto Heess , Joao Ferdinando Gomes de Freitas , Scott Ellison Reed

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes: obtaining data identifying a set of trajectories, each trajectory comprising a set of observations characterizing a set of states of the environment and corresponding actions performed by another agent in response to the states; obtaining data identifying an encoder that maps the observations onto embeddings for use in determining a set of imitation trajectories; determining, for each trajectory, a corresponding embedding by applying the encoder to the trajectory; determining a set of imitation trajectories by applying a policy defined by the neural network to the embedding for each trajectory; and adjusting parameters of the neural network based on the set of trajectories, the set of imitation trajectories and the embeddings.

23.

发明申请
DATA-EFFICIENT REINFORCEMENT LEARNING FOR CONTINUOUS CONTROL TASKS 审中-公开

公开(公告)号：US20190354813A1

公开(公告)日：2019-11-21

申请号：US16528260

申请日：2019-07-31

Applicant: DeepMind Technologies Limited

Inventor： Martin Riedmiller , Roland Hafner , Mel Vecerik , Timothy Paul Lillicrap , Thomas Lampe , Ivaylo Popov , Gabriel Barth-Maron , Nicolas Manfred Otto Heess

IPC: G06K9/62 , G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-efficient reinforcement learning. One of the systems is a system for training an actor neural network used to select actions to be performed by an agent that interacts with an environment by receiving observations characterizing states of the environment and, in response to each observation, performing an action selected from a continuous space of possible actions, wherein the actor neural network maps observations to next actions in accordance with values of parameters of the actor neural network, and wherein the system comprises: a plurality of workers, wherein each worker is configured to operate independently of each other worker, wherein each worker is associated with a respective agent replica that interacts with a respective replica of the environment during the training of the actor neural network.

24.

发明申请
NEURAL POPULATION LEARNING 有权

公开(公告)号：US20240412072A1

公开(公告)日：2024-12-12

申请号：US18422620

申请日：2024-01-25

Applicant: DeepMind Technologies Limited

Inventor： Siqi Liu , Luke Christopher Marris , Nicolas Manfred Otto Heess , Marc Lanctot

IPC: G06N3/092

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling an agent interacting with an environment using a population of action selection policies that are jointly represented by a population action selection neural network. In one aspect, a method comprises, at each of a plurality of time steps: obtaining an observation characterizing a current state of the environment at the time step; selecting a target action selection policy from the population of action selection policies; processing a network input comprising: (i) the observation, and (ii) a strategy embedding representing the target action selection policy, using the population action selection neural network to generate an action selection output; and selecting an action to be performed by the agent at the time step using the action selection output.

25.

发明公开
CONTROLLING AGENTS USING SUB-GOALS GENERATED BY LANGUAGE MODEL NEURAL NETWORKS 审中-公开

公开(公告)号：US20240311617A1

公开(公告)日：2024-09-19

申请号：US18443285

申请日：2024-02-15

Applicant: DeepMind Technologies Limited

Inventor： Norman Di Palo , Arunkumar Byravan , Nicolas Manfred Otto Heess , Martin Riedmiller , Leonard Hasenclever , Markus Wulfmeier

IPC: G06N3/0455

CPC classification number: G06N3/0455

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents using a language model neural network and a vision-language model (VLM) neural network.

26.

发明公开
PLANNING USING A JUMPY TRAJECTORY DECODER NEURAL NETWORK 审中-公开

公开(公告)号：US20240220795A1

公开(公告)日：2024-07-04

申请号：US18401226

申请日：2023-12-29

Applicant: DeepMind Technologies Limited

Inventor： Jingwei Zhang , Arunkumar Byravan , Jost Tobias Springenberg , Martin Riedmiller , Nicolas Manfred Otto Heess , Leonard Hasenclever , Abbas Abdolmaleki , Dushyant Rao

IPC: G06N3/08

CPC classification number: G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents using jumpy trajectory decoder neural networks.

27.

发明授权
Selecting reinforcement learning actions using a low-level controller 有权

公开(公告)号：US11875258B1

公开(公告)日：2024-01-16

申请号：US17541186

申请日：2021-12-02

Applicant: DeepMind Technologies Limited

Inventor： Nicolas Manfred Otto Heess , Timothy Paul Lillicrap , Gregory Duncan Wayne , Yuval Tassa

IPC: G06N3/08 , G06N3/006 , G06N3/044 , G06N3/045

CPC classification number: G06N3/08 , G06N3/006 , G06N3/044 , G06N3/045

Abstract: Methods, systems, and apparatus for selecting actions to be performed by an agent interacting with an environment. One system includes a high-level controller neural network, low-level controller network, and subsystem. The high-level controller neural network receives an input observation and processes the input observation to generate a high-level output defining a control signal for the low-level controller. The low-level controller neural network receives a designated component of an input observation and processes the designated component and an input control signal to generate a low-level output that defines an action to be performed by the agent in response to the input observation. The subsystem receives a current observation characterizing a current state of the environment, determines whether criteria are satisfied for generating a new control signal, and based on the determination, provides appropriate inputs to the high-level and low-level controllers for selecting an action to be performed by the agent.

28.

发明申请
MULTI-TASK NEURAL NETWORK SYSTEMS WITH TASK-SPECIFIC POLICIES AND A SHARED POLICY 有权

公开(公告)号：US20220083869A1

公开(公告)日：2022-03-17

申请号：US17486842

申请日：2021-09-27

Applicant: DeepMind Technologies Limited

Inventor： Razvan Pascanu , Raia Thais Hadsell , Victor Constant Bapst , Wojciech Czarnecki , James Kirkpatrick , Yee Whye Teh , Nicolas Manfred Otto Heess

IPC: G06N3/08 , G06N3/10 , G06N5/04

Abstract: A method is proposed for training a multitask computer system, such as a multitask neural network system. The system comprises a set of trainable workers and a shared module. The trainable workers and shared module are trained on a plurality of different tasks, such that each worker learns to perform a corresponding one of the tasks according to a respective task policy, and said shared policy network learns a multitask policy which represents common behavior for the tasks. The coordinated training is performed by optimizing an objective function comprising, for each task: a reward term indicative of an expected reward earned by a worker in performing the corresponding task according to the task policy; and at least one entropy term which regularizes the distribution of the task policy towards the distribution of the multitask policy.

29.

发明授权
Multi-task neural network systems with task-specific policies and a shared policy 有权

公开(公告)号：US11132609B2

公开(公告)日：2021-09-28

申请号：US16689020

申请日：2019-11-19

Applicant: DeepMind Technologies Limited

Inventor： Razvan Pascanu , Raia Thais Hadsell , Victor Constant Bapst , Wojciech Czarnecki , James Kirkpatrick , Yee Whye Teh , Nicolas Manfred Otto Heess

IPC: G06N3/08 , G06N3/10 , G06N5/04

Abstract: A method is proposed for training a multitask computer system, such as a multitask neural network system. The system comprises a set of trainable workers and a shared module. The trainable workers and shared module are trained on a plurality of different tasks, such that each worker learns to perform a corresponding one of the tasks according to a respective task policy, and said shared policy network learns a multitask policy which represents common behavior for the tasks. The coordinated training is performed by optimizing an objective function comprising, for each task: a reward term indicative of an expected reward earned by a worker in performing the corresponding task according to the task policy; and at least one entropy term which regularizes the distribution of the task policy towards the distribution of the multitask policy.

30.

发明申请
DATA-EFFICIENT REINFORCEMENT LEARNING FOR CONTINUOUS CONTROL TASKS 审中-公开

公开(公告)号：US20200285909A1

公开(公告)日：2020-09-10

申请号：US16882373

申请日：2020-05-22

Applicant: DeepMind Technologies Limited

Inventor： Martin Riedmiller , Roland Hafner , Mel Vecerik , Timothy Paul Lillicrap , Thomas Lampe , Ivaylo Popov , Gabriel Barth-Maron , Nicolas Manfred Otto Heess

IPC: G06K9/62 , G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-efficient reinforcement learning. One of the systems is a system for training an actor neural network used to select actions to be performed by an agent that interacts with an environment by receiving observations characterizing states of the environment and, in response to each observation, performing an action selected from a continuous space of possible actions, wherein the actor neural network maps observations to next actions in accordance with values of parameters of the actor neural network, and wherein the system comprises: a plurality of workers, wherein each worker is configured to operate independently of each other worker, wherein each worker is associated with a respective agent replica that interacts with a respective replica of the environment during the training of the actor neural network.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification