Patent search ap:("DeepMind Technologies Limited") AND inv:"David Constantine Patrick Warde-Farley" Page 1

1.

发明公开
CONTROLLING AGENTS USING AMORTIZED Q LEARNING 审中-公开

公开(公告)号：US20240160901A1

公开(公告)日：2024-05-16

申请号：US18406995

申请日：2024-01-08

Applicant: DeepMind Technologies Limited

Inventor： Tom Van de Wiele , Volodymyr Mnih , Andriy Mnih , David Constantine Patrick Warde-Farley

IPC: G06N3/047 , G06N3/006 , G06N3/084

CPC classification number: G06N3/047 , G06N3/006 , G06N3/084

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment. One of the methods includes receiving a current observation; processing the current observation using a proposal neural network to generate a proposal output that defines a proposal probability distribution over a set of possible actions that can be performed by the agent to interact with the environment; sampling (i) one or more actions from the set of possible actions in accordance with the proposal probability distribution and (ii) one or more actions randomly from the set of possible actions; processing the current observation and each sampled action using a Q neural network to generate a Q value; and selecting an action using the Q values generated by the Q neural network.

2.

发明公开
CONTROLLING AGENTS USING RELATIVE VARIATIONAL INTRINSIC CONTROL 审中-公开

公开(公告)号：US20230325635A1

公开(公告)日：2023-10-12

申请号：US18025304

申请日：2021-09-10

Applicant: DeepMind Technologies Limited

Inventor： David Constantine Patrick Warde-Farley , Steven Stenberg Hansen , Volodymyr Mnih , Kate Alexandra Baumli

IPC: G06N3/045 , G06N3/08

CPC classification number: G06N3/045 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network for use in controlling an agent using relative variational intrinsic control. In one aspect, a method includes: selecting a skill from a set of skills; generating a trajectory by controlling the agent using the policy neural network while the policy neural network is conditioned on the selected skill; processing an initial observation and a last observation using a relative discriminator neural network to generate a relative score; processing the last observation using an absolute discriminator neural network to generate an absolute score; generating a reward for the trajectory from the absolute score corresponding to the selected skill and the relative score corresponding to the selected skill; and training the policy neural network on the reward for the trajectory.

3.

发明授权
Controlling agents using amortized Q learning 有权

公开(公告)号：US11868866B2

公开(公告)日：2024-01-09

申请号：US17287306

申请日：2019-11-18

Applicant: DeepMind Technologies Limited

Inventor： Tom Van de Wiele , Volodymyr Mnih , Andriy Mnih , David Constantine Patrick Warde-Farley

IPC: G06N3/04 , G06N3/047 , G06N3/006 , G06N3/084

CPC classification number: G06N3/047 , G06N3/006 , G06N3/084

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment. One of the methods includes receiving a current observation; processing the current observation using a proposal neural network to generate a proposal output that defines a proposal probability distribution over a set of possible actions that can be performed by the agent to interact with the environment; sampling (i) one or more actions from the set of possible actions in accordance with the proposal probability distribution and (ii) one or more actions randomly from the set of possible actions; processing the current observation and each sampled action using a Q neural network to generate a Q value; and selecting an action using the Q values generated by the Q neural network.

4.

发明申请
UNSUPERVISED CONTROL USING LEARNED REWARDS 审中-公开

公开(公告)号：US20190354869A1

公开(公告)日：2019-11-21

申请号：US16416920

申请日：2019-05-20

Applicant: DeepMind Technologies Limited

Inventor： David Constantine Patrick Warde-Farley , Volodymyr Mnih

IPC: G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent that interacts with an environment. In one aspect, a system comprises: an action selection subsystem that selects actions to be performed by the agent using an action selection policy generated using an action selection neural network; a reward subsystem that is configured to: receive an observation characterizing a current state of the environment and an observation characterizing a goal state of the environment; generate a reward using an embedded representation of the observation characterizing the current state of the environment and an embedded representation of the observation characterizing the goal state of the environment; and a training subsystem that is configured to train the action selection neural network based on the rewards generated by the reward subsystem using reinforcement learning techniques.

5.

发明授权
Unsupervised control using learned rewards 有权

公开(公告)号：US11727281B2

公开(公告)日：2023-08-15

申请号：US17586350

申请日：2022-01-27

Applicant: DeepMind Technologies Limited

Inventor： David Constantine Patrick Warde-Farley , Volodymyr Mnih

IPC: G06N3/088

CPC classification number: G06N3/088

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent that interacts with an environment. In one aspect, a system comprises: an action selection subsystem that selects actions to be performed by the agent using an action selection policy generated using an action selection neural network; a reward subsystem that is configured to: receive an observation characterizing a current state of the environment and an observation characterizing a goal state of the environment; generate a reward using an embedded representation of the observation characterizing the current state of the environment and an embedded representation of the observation characterizing the goal state of the environment; and a training subsystem that is configured to train the action selection neural network based on the rewards generated by the reward subsystem using reinforcement learning techniques.

6.

发明申请
UNSUPERVISED CONTROL USING LEARNED REWARDS 有权

公开(公告)号：US20220164673A1

公开(公告)日：2022-05-26

申请号：US17586350

申请日：2022-01-27

Applicant: DeepMind Technologies Limited

Inventor： David Constantine Patrick Warde-Farley , Volodymyr Mnih

IPC: G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent that interacts with an environment. In one aspect, a system comprises: an action selection subsystem that selects actions to be performed by the agent using an action selection policy generated using an action selection neural network; a reward subsystem that is configured to: receive an observation characterizing a current state of the environment and an observation characterizing a goal state of the environment; generate a reward using an embedded representation of the observation characterizing the current state of the environment and an embedded representation of the observation characterizing the goal state of the environment; and a training subsystem that is configured to train the action selection neural network based on the rewards generated by the reward subsystem using reinforcement learning techniques.

7.

发明授权
Unsupervised control using learned rewards 有权

公开(公告)号：US11263531B2

公开(公告)日：2022-03-01

申请号：US16416920

申请日：2019-05-20

Applicant: DeepMind Technologies Limited

Inventor： David Constantine Patrick Warde-Farley , Volodymyr Mnih

IPC: G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent that interacts with an environment. In one aspect, a system comprises: an action selection subsystem that selects actions to be performed by the agent using an action selection policy generated using an action selection neural network; a reward subsystem that is configured to: receive an observation characterizing a current state of the environment and an observation characterizing a goal state of the environment; generate a reward using an embedded representation of the observation characterizing the current state of the environment and an embedded representation of the observation characterizing the goal state of the environment; and a training subsystem that is configured to train the action selection neural network based on the rewards generated by the reward subsystem using reinforcement learning techniques.

8.

发明申请
CONTROLLING AGENTS USING AMORTIZED Q LEARNING 有权

公开(公告)号：US20210357731A1

公开(公告)日：2021-11-18

申请号：US17287306

申请日：2019-11-18

Applicant: DeepMind Technologies Limited

Inventor： Tom Van de Wiele , Volodymyr Mnih , Andriy Mnih , David Constantine Patrick Warde-Farley

IPC: G06N3/04 , G06N3/08 , G06N3/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment. One of the methods includes receiving a current observation; processing the current observation using a proposal neural network to generate a proposal output that defines a proposal probability distribution over a set of possible actions that can be performed by the agent to interact with the environment; sampling (i) one or more actions from the set of possible actions in accordance with the proposal probability distribution and (ii) one or more actions randomly from the set of possible actions; processing the current observation and each sampled action using a Q neural network to generate a Q value; and selecting an action using the Q values generated by the Q neural network.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification