-
公开(公告)号:US20200167633A1
公开(公告)日:2020-05-28
申请号:US16615061
申请日:2018-05-22
Applicant: DEEPMIND TECHNOLOGIES LIMITED
Inventor: Misha Man Ray Denil , Sergio Gomez Colmenarejo , Serkan Cabi , David William Saxton , Joao Ferdinando Gomes de Freitas
Abstract: A reinforcement learning system is proposed comprising a plurality of property detector neural networks. Each property detector neural network is arranged to receive data representing an object within an environment, and to generate property data associated with a property of the object. A processor is arranged to receive an instruction indicating a task associated with an object having an associated property, and process the output of the plurality of property detector neural networks based upon the instruction to generate a relevance data item. The relevance data item indicates objects within the environment associated with the task. The processor also generates a plurality of weights based upon the relevance data item, and, based on the weights, generates modified data representing the plurality of objects within the environment. A neural network is arranged to receive the modified data and to output an action associated with the task.
-
公开(公告)号:US20240394504A1
公开(公告)日:2024-11-28
申请号:US18637279
申请日:2024-04-16
Applicant: DeepMind Technologies Limited
Inventor: Misha Man Ray Denil , Sergio Gomez Colmenarejo , Serkan Cabi , David William Saxton , Joao Ferdinando Gomes de Freitas
Abstract: A reinforcement learning system is proposed comprising a plurality of property detector neural networks. Each property detector neural network is arranged to receive data representing an object within an environment, and to generate property data associated with a property of the object. A processor is arranged to receive an instruction indicating a task associated with an object having an associated property, and process the output of the plurality of property detector neural networks based upon the instruction to generate a relevance data item. The relevance data item indicates objects within the environment associated with the task. The processor also generates a plurality of weights based upon the relevance data item, and, based on the weights, generates modified data representing the plurality of objects within the environment. A neural network is arranged to receive the modified data and to output an action associated with the task.
-
公开(公告)号:US20210383222A1
公开(公告)日:2021-12-09
申请号:US17337820
申请日:2021-06-03
Applicant: DeepMind Technologies Limited
Inventor: David William Saxton , Eshaan Nichani
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network by estimating the objective function curvature based on current and previous gradients. In one aspect, a method comprises: sampling a batch of training data; and for each neural network parameter: determining, based on the current batch of training data, a respective current gradient of the objective function at the current iteration with respect to the current neural network parameter; estimating an objective function curvature with respect to the current neural network parameter based on (i) the current gradient of the objective function at the current iteration, and (ii) a respective previous gradient of the objective function at each of a plurality of previous iterations; and updating a current value of the neural network parameter based on the estimate of the curvature of the objective function.
-
-