Patent search ap:("DeepMind Technologies Limited") AND inv:"Steven Stenberg Hansen" Page 1

1.

发明公开
CONTROLLING AGENTS USING RELATIVE VARIATIONAL INTRINSIC CONTROL 审中-公开

公开(公告)号：US20230325635A1

公开(公告)日：2023-10-12

申请号：US18025304

申请日：2021-09-10

Applicant: DeepMind Technologies Limited

Inventor： David Constantine Patrick Warde-Farley , Steven Stenberg Hansen , Volodymyr Mnih , Kate Alexandra Baumli

IPC: G06N3/045 , G06N3/08

CPC classification number: G06N3/045 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network for use in controlling an agent using relative variational intrinsic control. In one aspect, a method includes: selecting a skill from a set of skills; generating a trajectory by controlling the agent using the policy neural network while the policy neural network is conditioned on the selected skill; processing an initial observation and a last observation using a relative discriminator neural network to generate a relative score; processing the last observation using an absolute discriminator neural network to generate an absolute score; generating a reward for the trajectory from the absolute score corresponding to the selected skill and the relative score corresponding to the selected skill; and training the policy neural network on the reward for the trajectory.

2.

发明公开
REINFORCEMENT LEARNING USING AN ENSEMBLE OF DISCRIMINATOR MODELS 审中-公开

公开(公告)号：US20240311639A1

公开(公告)日：2024-09-19

申请号：US18281711

申请日：2022-05-27

Applicant: DeepMind Technologies Limited

Inventor： Steven Stenberg Hansen , Daniel Joseph Strouse

IPC: G06N3/092 , G06N3/045

CPC classification number: G06N3/092 , G06N3/045

Abstract: This specification describes a method performed by one or more data processing apparatus that includes: sampling a latent from a set of possible latents, selecting actions to be performed by an agent to interact with an environment over a sequence of time steps using an action selection neural network that is conditioned on the sampled latent, determining a respective reward received for each time step in the sequence of time steps using an ensemble of discriminator models, and training the action selection neural network based on the rewards using a reinforcement learning technique. Each discriminator model can process an observation to generate a respective prediction output that predicts which latent the action selection neural network was conditioned on to cause the environment to enter the state characterized by the observation.

3.

发明公开
LEARNING DIVERSE SKILLS FOR TASKS USING SEQUENTIAL LATENT VARIABLES FOR ENVIRONMENT DYNAMICS 审中-公开

公开(公告)号：US20240185083A1

公开(公告)日：2024-06-06

申请号：US18285519

申请日：2022-05-27

Applicant: DeepMind Technologies Limited

Inventor： Steven Stenberg Hansen , Guillaume Desjardins

IPC: G06N3/092

CPC classification number: G06N3/092

Abstract: This specification relates to methods for controlling agents to perform actions according to a goal (or option) comprising a sequence of local goals (or local options) and corresponding methods for training. As discussed herein, environment dynamics may be modelled sequentially by sampling latent variables, each latent variable relating to a local goal and being dependent on a previous latent variable. These latent variables are used to condition an action-selection policy neural network to select actions according to the local goal. This allows the agents to reach more diverse states than would be possible through a fixed latent variable or goal, thereby encouraging exploratory behavior. In addition, specific methods described herein model the sequence of latent variables through a simple linear and recurrent relationship that allows the system to be trained more efficiently. This avoids the need to learn a state-dependent higher level policy for selecting the latent variables which can be difficult to train in practice.

Patent Agency Ranking