TRAINING MULTI-OBJECTIVE NEURAL NETWORK REINFORCEMENT LEARNING SYSTEMS

Invention Application

US20230082326A1 TRAINING MULTI-OBJECTIVE NEURAL NETWORK REINFORCEMENT LEARNING SYSTEMS 有权

Please log in to see more content

Patent Title: TRAINING MULTI-OBJECTIVE NEURAL NETWORK REINFORCEMENT LEARNING SYSTEMS
Application No.: US17797203

Application Date: 2021-02-08
Publication No.: US20230082326A1

Publication Date: 2023-03-16
Inventor: Abbas Abdolmaleki , Sandy Han Huang
Applicant: DeepMind Technologies Limited
Applicant Address: GB London
Assignee: DeepMind Technologies Limited
Current Assignee: DeepMind Technologies Limited
Current Assignee Address: GB London
International Application: PCT/EP2021/052986 WO 20210208
Main IPC: G06N3/08
IPC: G06N3/08

TRAINING MULTI-OBJECTIVE NEURAL NETWORK REINFORCEMENT LEARNING SYSTEMS

Abstract:

There is provided a method for training a neural network system by reinforcement learning, the neural network system being configured to receive an input observation characterizing a state of an environment interacted with by an agent and to select and output an action in accordance with a policy that aims to satisfy a plurality of objectives. The method comprises obtaining a set of one or more trajectories. Each trajectory comprises a state of an environment, an action applied by the agent to the environment according to a previous policy in response to the state, and a set of rewards for the action, each reward relating to a corresponding objective of the plurality of objectives. The method further comprises determining an action-value function for each of the plurality of objectives based on the set of one or more trajectories. Each action-value function determines an action value representing an estimated return according to the corresponding objective that would result from the agent performing a given action in response to a given state according to the previous policy. The method further comprises determining an updated policy based on a combination of the action-value functions for the plurality of objectives.

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N3/00	基于生物学模型的计算机系统
G06N3/02	.采用神经网络模型
G06N3/08	..学习方法