-
公开(公告)号:US20210089910A1
公开(公告)日:2021-03-25
申请号:US17033410
申请日:2020-09-25
Applicant: DeepMind Technologies Limited
Inventor: Zeyu Zheng , Junhyuk Oh , Satinder Singh Baveja
Abstract: There is described methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The reinforcement learning system comprises an agent configured to perform actions based upon a policy and an intrinsic reward system configured to generate intrinsic reward values for the agent based upon the actions taken by the agent. The method comprises training the reinforcement learning system based upon a plurality of tasks. The training comprises updating the agent's policy based upon the intrinsic reward values generated by the intrinsic reward system and updating the intrinsic reward system based upon an extrinsic reward value obtained based upon the task being performed by the agent. The training further comprises re-initializing the agent's policy when an expiration criterion associated with the agent is met.
-
公开(公告)号:US12293283B2
公开(公告)日:2025-05-06
申请号:US17033410
申请日:2020-09-25
Applicant: DeepMind Technologies Limited
Inventor: Zeyu Zheng , Junhyuk Oh , Satinder Singh Baveja
Abstract: There is described methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The reinforcement learning system comprises an agent configured to perform actions based upon a policy and an intrinsic reward system configured to generate intrinsic reward values for the agent based upon the actions taken by the agent. The method comprises training the reinforcement learning system based upon a plurality of tasks. The training comprises updating the agent's policy based upon the intrinsic reward values generated by the intrinsic reward system and updating the intrinsic reward system based upon an extrinsic reward value obtained based upon the task being performed by the agent. The training further comprises re-initializing the agent's policy when an expiration criterion associated with the agent is met.
-
3.
公开(公告)号:US20230144995A1
公开(公告)日:2023-05-11
申请号:US17918365
申请日:2021-06-07
Applicant: DeepMind Technologies Limited
Inventor: Vivek Veeriah Jeya Veeraiah , Tom Ben Zion Zahavy , Matteo Hessel , Zhongwen Xu , Junhyuk Oh , Iurii Kemaev , Hado Philip van Hasselt , David Silver , Satinder Singh Baveja
Abstract: A reinforcement learning system, method, and computer program code for controlling an agent to perform a plurality of tasks while interacting with an environment. The system learns options, where an option comprises a sequence of primitive actions performed by the agent under control of an option policy neural network. In implementations the system discovers options which are useful for multiple different tasks by meta-learning rewards for training the option policy neural network whilst the agent is interacting with the environment.
-
-