Patent search ap:("DeepMind Technologies Limited") AND inv:"Sandy Han Huang" Page 1

1.

发明申请
TRAINING MULTI-OBJECTIVE NEURAL NETWORK REINFORCEMENT LEARNING SYSTEMS 有权

公开(公告)号：US20230082326A1

公开(公告)日：2023-03-16

申请号：US17797203

申请日：2021-02-08

Applicant: DeepMind Technologies Limited

Inventor： Abbas Abdolmaleki , Sandy Han Huang

IPC: G06N3/08

Abstract: There is provided a method for training a neural network system by reinforcement learning, the neural network system being configured to receive an input observation characterizing a state of an environment interacted with by an agent and to select and output an action in accordance with a policy that aims to satisfy a plurality of objectives. The method comprises obtaining a set of one or more trajectories. Each trajectory comprises a state of an environment, an action applied by the agent to the environment according to a previous policy in response to the state, and a set of rewards for the action, each reward relating to a corresponding objective of the plurality of objectives. The method further comprises determining an action-value function for each of the plurality of objectives based on the set of one or more trajectories. Each action-value function determines an action value representing an estimated return according to the corresponding objective that would result from the agent performing a given action in response to a given state according to the previous policy. The method further comprises determining an updated policy based on a combination of the action-value functions for the plurality of objectives.

2.

发明公开
CONSTRAINED REINFORCEMENT LEARNING NEURAL NETWORK SYSTEMS USING PARETO FRONT OPTIMIZATION 审中-公开

公开(公告)号：US20230368037A1

公开(公告)日：2023-11-16

申请号：US18029992

申请日：2021-10-01

Applicant: DeepMind Technologies Limited

Inventor： Sandy Han Huang , Abbas Abdolmaleki

IPC: G06N3/092

CPC classification number: G06N3/092

Abstract: A system and method that controls an agent to perform a task subject to one or more constraints. The system trains a preference neural network that learns which preferences produce constraint-satisfying action selection policies. Thus the system optimizes a hierarchical policy that is a product of a preference policy and a preference-conditioned action selection policy. Thus the system learns to jointly optimize a set of objectives relating to rewards and costs received during the task whilst also learning preferences, i.e. trade-offs between the rewards and costs, that are most likely to produce policies that satisfy the constraints.

3.

发明公开
MULTI-OBJECTIVE REINFORCEMENT LEARNING USING WEIGHTED POLICY PROJECTION 审中-公开

公开(公告)号：US20240185084A1

公开(公告)日：2024-06-06

申请号：US18286504

申请日：2022-05-27

Applicant: DeepMind Technologies Limited

Inventor： Abbas Abdolmaleki , Sandy Han Huang , Martin Riedmiller

IPC: G06N3/092

CPC classification number: G06N3/092

Abstract: Computer implemented systems and methods for training an action selection policy neural network to select actions to be performed by an agent to control the agent to perform a task. The techniques are able to optimize multiple objectives one of which may be to stay close to a behavioral policy of a teacher. The behavioral policy of the teacher may be defined by a predetermined dataset of behaviors and the systems and methods may then learn offline. The described techniques provide a mechanism for explicitly defining a trade-off between the multiple objectives.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification