-
公开(公告)号:US10776692B2
公开(公告)日:2020-09-15
申请号:US15217758
申请日:2016-07-22
Applicant: DeepMind Technologies Limited
Inventor: Timothy Paul Lillicrap , Jonathan James Hunt , Alexander Pritzel , Nicolas Manfred Otto Heess , Tom Erez , Yuval Tassa , David Silver , Daniel Pieter Wierstra
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an actor neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a minibatch of experience tuples; and updating current values of the parameters of the actor neural network, comprising: for each experience tuple in the minibatch: processing the training observation and the training action in the experience tuple using a critic neural network to determine a neural network output for the experience tuple, and determining a target neural network output for the experience tuple; updating current values of the parameters of the critic neural network using errors between the target neural network outputs and the neural network outputs; and updating the current values of the parameters of the actor neural network using the critic neural network.
-
公开(公告)号:US20200090042A1
公开(公告)日:2020-03-19
申请号:US16688934
申请日:2019-11-19
Applicant: DeepMind Technologies Limited
Inventor: Gregory Duncan Wayne , Joshua Merel , Ziyu Wang , Nicolas Manfred Otto Heess , Joao Ferdinando Gomes de Freitas , Scott Ellison Reed
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes: obtaining data identifying a set of trajectories, each trajectory comprising a set of observations characterizing a set of states of the environment and corresponding actions performed by another agent in response to the states; obtaining data identifying an encoder that maps the observations onto embeddings for use in determining a set of imitation trajectories; determining, for each trajectory, a corresponding embedding by applying the encoder to the trajectory; determining a set of imitation trajectories by applying a policy defined by the neural network to the embedding for each trajectory; and adjusting parameters of the neural network based on the set of trajectories, the set of imitation trajectories and the embeddings.
-
公开(公告)号:US20190354813A1
公开(公告)日:2019-11-21
申请号:US16528260
申请日:2019-07-31
Applicant: DeepMind Technologies Limited
Inventor: Martin Riedmiller , Roland Hafner , Mel Vecerik , Timothy Paul Lillicrap , Thomas Lampe , Ivaylo Popov , Gabriel Barth-Maron , Nicolas Manfred Otto Heess
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-efficient reinforcement learning. One of the systems is a system for training an actor neural network used to select actions to be performed by an agent that interacts with an environment by receiving observations characterizing states of the environment and, in response to each observation, performing an action selected from a continuous space of possible actions, wherein the actor neural network maps observations to next actions in accordance with values of parameters of the actor neural network, and wherein the system comprises: a plurality of workers, wherein each worker is configured to operate independently of each other worker, wherein each worker is associated with a respective agent replica that interacts with a respective replica of the environment during the training of the actor neural network.
-
公开(公告)号:US20240412072A1
公开(公告)日:2024-12-12
申请号:US18422620
申请日:2024-01-25
Applicant: DeepMind Technologies Limited
Inventor: Siqi Liu , Luke Christopher Marris , Nicolas Manfred Otto Heess , Marc Lanctot
IPC: G06N3/092
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling an agent interacting with an environment using a population of action selection policies that are jointly represented by a population action selection neural network. In one aspect, a method comprises, at each of a plurality of time steps: obtaining an observation characterizing a current state of the environment at the time step; selecting a target action selection policy from the population of action selection policies; processing a network input comprising: (i) the observation, and (ii) a strategy embedding representing the target action selection policy, using the population action selection neural network to generate an action selection output; and selecting an action to be performed by the agent at the time step using the action selection output.
-
公开(公告)号:US20240311617A1
公开(公告)日:2024-09-19
申请号:US18443285
申请日:2024-02-15
Applicant: DeepMind Technologies Limited
Inventor: Norman Di Palo , Arunkumar Byravan , Nicolas Manfred Otto Heess , Martin Riedmiller , Leonard Hasenclever , Markus Wulfmeier
IPC: G06N3/0455
CPC classification number: G06N3/0455
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents using a language model neural network and a vision-language model (VLM) neural network.
-
公开(公告)号:US20240220795A1
公开(公告)日:2024-07-04
申请号:US18401226
申请日:2023-12-29
Applicant: DeepMind Technologies Limited
Inventor: Jingwei Zhang , Arunkumar Byravan , Jost Tobias Springenberg , Martin Riedmiller , Nicolas Manfred Otto Heess , Leonard Hasenclever , Abbas Abdolmaleki , Dushyant Rao
IPC: G06N3/08
CPC classification number: G06N3/08
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents using jumpy trajectory decoder neural networks.
-
公开(公告)号:US11875258B1
公开(公告)日:2024-01-16
申请号:US17541186
申请日:2021-12-02
Applicant: DeepMind Technologies Limited
Abstract: Methods, systems, and apparatus for selecting actions to be performed by an agent interacting with an environment. One system includes a high-level controller neural network, low-level controller network, and subsystem. The high-level controller neural network receives an input observation and processes the input observation to generate a high-level output defining a control signal for the low-level controller. The low-level controller neural network receives a designated component of an input observation and processes the designated component and an input control signal to generate a low-level output that defines an action to be performed by the agent in response to the input observation. The subsystem receives a current observation characterizing a current state of the environment, determines whether criteria are satisfied for generating a new control signal, and based on the determination, provides appropriate inputs to the high-level and low-level controllers for selecting an action to be performed by the agent.
-
公开(公告)号:US20220083869A1
公开(公告)日:2022-03-17
申请号:US17486842
申请日:2021-09-27
Applicant: DeepMind Technologies Limited
Inventor: Razvan Pascanu , Raia Thais Hadsell , Victor Constant Bapst , Wojciech Czarnecki , James Kirkpatrick , Yee Whye Teh , Nicolas Manfred Otto Heess
Abstract: A method is proposed for training a multitask computer system, such as a multitask neural network system. The system comprises a set of trainable workers and a shared module. The trainable workers and shared module are trained on a plurality of different tasks, such that each worker learns to perform a corresponding one of the tasks according to a respective task policy, and said shared policy network learns a multitask policy which represents common behavior for the tasks. The coordinated training is performed by optimizing an objective function comprising, for each task: a reward term indicative of an expected reward earned by a worker in performing the corresponding task according to the task policy; and at least one entropy term which regularizes the distribution of the task policy towards the distribution of the multitask policy.
-
公开(公告)号:US11132609B2
公开(公告)日:2021-09-28
申请号:US16689020
申请日:2019-11-19
Applicant: DeepMind Technologies Limited
Inventor: Razvan Pascanu , Raia Thais Hadsell , Victor Constant Bapst , Wojciech Czarnecki , James Kirkpatrick , Yee Whye Teh , Nicolas Manfred Otto Heess
Abstract: A method is proposed for training a multitask computer system, such as a multitask neural network system. The system comprises a set of trainable workers and a shared module. The trainable workers and shared module are trained on a plurality of different tasks, such that each worker learns to perform a corresponding one of the tasks according to a respective task policy, and said shared policy network learns a multitask policy which represents common behavior for the tasks. The coordinated training is performed by optimizing an objective function comprising, for each task: a reward term indicative of an expected reward earned by a worker in performing the corresponding task according to the task policy; and at least one entropy term which regularizes the distribution of the task policy towards the distribution of the multitask policy.
-
公开(公告)号:US20200285909A1
公开(公告)日:2020-09-10
申请号:US16882373
申请日:2020-05-22
Applicant: DeepMind Technologies Limited
Inventor: Martin Riedmiller , Roland Hafner , Mel Vecerik , Timothy Paul Lillicrap , Thomas Lampe , Ivaylo Popov , Gabriel Barth-Maron , Nicolas Manfred Otto Heess
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-efficient reinforcement learning. One of the systems is a system for training an actor neural network used to select actions to be performed by an agent that interacts with an environment by receiving observations characterizing states of the environment and, in response to each observation, performing an action selected from a continuous space of possible actions, wherein the actor neural network maps observations to next actions in accordance with values of parameters of the actor neural network, and wherein the system comprises: a plurality of workers, wherein each worker is configured to operate independently of each other worker, wherein each worker is associated with a respective agent replica that interacts with a respective replica of the environment during the training of the actor neural network.
-
-
-
-
-
-
-
-
-