Patent search ap:("DeepMind Technologies Limited") AND inv:"Mark Daniel Rowland" Page 1

1.

发明公开
REINFORCEMENT LEARNING USING QUANTILE CREDIT ASSIGNMENT 审中-公开

公开(公告)号：US20240256883A1

公开(公告)日：2024-08-01

申请号：US18424561

申请日：2024-01-26

Applicant: DeepMind Technologies Limited

Inventor： Thomas Mesnard , Remi Munos , Alaa Saade , Yunhao Tang , Mark Daniel Rowland , Theophane Guillaume Weber , Wenqi Chen

IPC: G06N3/092

CPC classification number: G06N3/092

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions to be performed by an agent interacting with an environment. Implementations of the system can take into account a level of luck in the environment, and hence whilst learning can account for outcomes that were caused by external factors as well as those dependent on the actions of the agent.

2.

发明公开
REINFORCEMENT LEARNING BY DIRECTLY LEARNING AN ADVANTAGE FUNCTION 审中-公开

公开(公告)号：US20240256882A1

公开(公告)日：2024-08-01

申请号：US18424520

申请日：2024-01-26

Applicant: DeepMind Technologies Limited

Inventor： Yunhao Tang , Remi Munos , Mark Daniel Rowland , Michal Valko

IPC: G06N3/092

CPC classification number: G06N3/092

Abstract: A system and method, implemented by one or more computers, of controlling an agent to take actions in an environment to perform a task is provided. The method comprises maintaining a value function neural network an advantage function neural network that is an estimate of a state-action advantage function representing a relative advantage of performing one possible action relative to the other possible actions. The method further comprises using the advantage function neural network to control the agent to take actions in the environment to perform the task. The method also comprises training the value function neural network and the advantage function neural network in a way that takes into account a behavior policy defined by a distribution of actions taken by the agent in training data.

3.

发明申请
CONTROLLING REINFORCEMENT LEARNING AGENTS USING GEOMETRIC POLICY COMPOSITION 有权

公开(公告)号：US20250124297A1

公开(公告)日：2025-04-17

申请号：US18834208

申请日：2023-01-30

Applicant: DeepMind Technologies Limited

Inventor： Mark Daniel Rowland , Shantanu Yogeshraj Thakoor , Andre da Motta Salles Barreto , Diana Luiza Borsa , William Clinton Dabney , Remi Munos

IPC: G06N3/092 , G06N3/0455

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling a reinforcement learning agent in an environment. One of the methods may include maintaining data specifying a base policy set comprising a plurality of base policies for controlling the agent; receiving a current observation characterizing a current state of the environment; generating, for each of the plurality of base policies, one or more predicted future observations characterizing respective future states of the environment that are subsequent to the current state of the environment; using the predicted future observations generated for the plurality of base policies to determine a respective estimated value for each composite policy in a composite policy set with respect to the current state of the environment; and selecting an action using the respective estimated values for the composite policies.

Patent Agency Ranking