-
公开(公告)号:US20200090006A1
公开(公告)日:2020-03-19
申请号:US16689058
申请日:2019-11-19
Applicant: DeepMind Technologies Limited
Inventor: Daniel Pieter Wierstra , Yujia Li , Razvan Pascanu , Peter William Battaglia , Theophane Guillaume Weber , Lars Buesing , David Paul Reichert , Arthur Clement Guez , Danilo Jimenez Rezende , Adrià Puigdomènech Badia , Oriol Vinyals , Nicolas Manfred Otto Heess , Sebastien Henri Andre Racaniere
Abstract: A neural network system is proposed. The neural network can be trained by model-based reinforcement learning to select actions to be performed by an agent interacting with an environment, to perform a task in an attempt to achieve a specified result. The system may comprise at least one imagination core which receives a current observation characterizing a current state of the environment, and optionally historical observations, and which includes a model of the environment. The imagination core may be configured to output trajectory data in response to the current observation, and/or historical observations. The trajectory data comprising a sequence of future features of the environment imagined by the imagination core. The system may also include a rollout encoder to encode the features, and an output stage to receive data derived from the rollout embedding and to output action policy data for identifying an action based on the current observation.
-
22.
公开(公告)号:US20190354885A1
公开(公告)日:2019-11-21
申请号:US16417580
申请日:2019-05-20
Applicant: DeepMind Technologies Limited
Inventor: Yujia Li , Victor Constant Bapst , Vinicius Zambaldi , David Nunes Raposo , Adam Anthony Santoro
Abstract: A neural network system is proposed, including an input network for extracting, from state data, respective entity data for each a plurality of entities which are present, or at least potentially present, in the environment. The entity data describes the entity. The neural network contains a relational network for parsing this data, which includes one or more attention blocks which may be stacked to perform successive actions on the entity data. The attention blocks each include a respective transform network for each of the entities. The transform network for each entity is able to transform data which the transform network receives for the entity into modified entity data for the entity, based on data for a plurality of the other entities. An output network is arranged to receive data output by the relational network, and use the received data to select a respective action.
-