-
公开(公告)号:US20190279076A1
公开(公告)日:2019-09-12
申请号:US16298448
申请日:2019-03-11
Applicant: DeepMind Technologies Limited
Inventor: Huiyi Hu , Ray Jiang , Timothy Arthur Mann , Sven Adrian Gowal , Balaji Lakshminarayanan , Andras Gyorgy
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for learning from delayed outcomes using neural networks. One of the methods includes receiving an input observation; generating, from the input observation, an output label distribution over possible labels for the input observation at a final time, comprising: processing the input observation using a first neural network configured to process the input observation to generate a distribution over possible values for an intermediate indicator at a first time earlier than the final time; generating, from the distribution, an input value for the intermediate indicator; and processing the input value for the intermediate indicator using a second neural network configured to process the input value for the intermediate indicator to determine the output label distribution over possible values for the input observation at the final time; and providing an output derived from the output label distribution.