-
21.
公开(公告)号:US20210034970A1
公开(公告)日:2021-02-04
申请号:US16767049
申请日:2019-02-05
Applicant: DeepMind Technologies Limited
Inventor: Hubert Josef Soyer , Lasse Espeholt , Karen Simonyan , Yotam Doron , Vlad Firoiu , Volodymyr Mnih , Koray Kavukcuoglu , Remi Munos , Thomas Ward , Timothy James Alexander Harley , Iain Robert Dunning
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.
-
公开(公告)号:US10832134B2
公开(公告)日:2020-11-10
申请号:US15374974
申请日:2016-12-09
Applicant: DeepMind Technologies Limited
Inventor: Alexander Benjamin Graves , Ivo Danihelka , Timothy James Alexander Harley , Malcolm Kevin Campbell Reynolds , Gregory Duncan Wayne
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for augmenting neural networks with an external memory. One of the systems includes a memory interface subsystem that is configured to perform operations comprising determining a respective content-based weight for each of a plurality of locations in an external memory; determining a respective allocation weight for each of the plurality of locations in the external memory; determining a respective final writing weight for each of the plurality of locations in the external memory from the respective content-based weight for the location and the respective allocation weight for the location; and writing data defined by the write vector to the external memory in accordance with the final writing weights.
-