-
公开(公告)号:US20250068919A1
公开(公告)日:2025-02-27
申请号:US18238400
申请日:2023-08-25
Applicant: DeepMind Technologies Limited
Inventor: Daniel Jarrett , Corentin Tallec , Florent Altché , Thomas Mesnard , Remi Munos , Michal Valko
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions to be performed by an agent interacting with an environment. Implementations of the method model unpredictable aspects of the future, using hindsight. They use this information to disentangle inherently unpredictable, aleatoric variation, from epistemic uncertainty that arises from lack of knowledge of the environment. They then use the epistemic uncertainty, which relates to in principle predictable aspects of the environment, as a source of intrinsic reward to drive curiosity, i.e. exploration of the environment by the agent.