Patent search ap:("DeepMind Technologies Limited") AND inv:"Zeyu Zheng" Page 1

1.

发明授权
Reinforcement learning using meta-learned intrinsic rewards 有权

公开(公告)号：US12293283B2

公开(公告)日：2025-05-06

申请号：US17033410

申请日：2020-09-25

Applicant: DeepMind Technologies Limited

Inventor： Zeyu Zheng , Junhyuk Oh , Satinder Singh Baveja

IPC: G06N3/08 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/084

Abstract: There is described methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The reinforcement learning system comprises an agent configured to perform actions based upon a policy and an intrinsic reward system configured to generate intrinsic reward values for the agent based upon the actions taken by the agent. The method comprises training the reinforcement learning system based upon a plurality of tasks. The training comprises updating the agent's policy based upon the intrinsic reward values generated by the intrinsic reward system and updating the intrinsic reward system based upon an extrinsic reward value obtained based upon the task being performed by the agent. The training further comprises re-initializing the agent's policy when an expiration criterion associated with the agent is met.

2.

发明申请
REINFORCEMENT LEARNING USING META-LEARNED INTRINSIC REWARDS 有权

公开(公告)号：US20210089910A1

公开(公告)日：2021-03-25

申请号：US17033410

申请日：2020-09-25

Applicant: DeepMind Technologies Limited

Inventor： Zeyu Zheng , Junhyuk Oh , Satinder Singh Baveja

IPC: G06N3/08 , G06N3/04

Abstract: There is described methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The reinforcement learning system comprises an agent configured to perform actions based upon a policy and an intrinsic reward system configured to generate intrinsic reward values for the agent based upon the actions taken by the agent. The method comprises training the reinforcement learning system based upon a plurality of tasks. The training comprises updating the agent's policy based upon the intrinsic reward values generated by the intrinsic reward system and updating the intrinsic reward system based upon an extrinsic reward value obtained based upon the task being performed by the agent. The training further comprises re-initializing the agent's policy when an expiration criterion associated with the agent is met.

Patent Agency Ranking