Patent search ap:("DeepMind Technologies Limited") AND inv:"Rory Fearon" Page 1

1.

发明授权
Distributed training of reinforcement learning systems 有权

公开(公告)号：US11507827B2

公开(公告)日：2022-11-22

申请号：US16601455

申请日：2019-10-14

Applicant: DeepMind Technologies Limited

Inventor： Praveen Deepak Srinivasan , Rory Fearon , Cagdas Alcicek , Arun Sarath Nair , Samuel Blackwell , Vedavyas Panneershelvam , Alessandro De Maria , Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Mustafa Suleyman

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributed training of reinforcement learning systems. One of the methods includes receiving, by a learner, current values of the parameters of the Q network from a parameter server, wherein each learner maintains a respective learner Q network replica and a respective target Q network replica; updating, by the learner, the parameters of the learner Q network replica maintained by the learner using the current values; selecting, by the learner, an experience tuple from a respective replay memory; computing, by the learner, a gradient from the experience tuple using the learner Q network replica maintained by the learner and the target Q network replica maintained by the learner; and providing, by the learner, the computed gradient to the parameter server.

2.

发明授权
Distributed training of reinforcement learning systems 有权

公开(公告)号：US10445641B2

公开(公告)日：2019-10-15

申请号：US15016173

申请日：2016-02-04

Applicant: DeepMind Technologies Limited

Inventor： Praveen Deepak Srinivasan , Rory Fearon , Cagdas Alcicek , Arun Sarath Nair , Samuel Blackwell , Vedavyas Panneershelvam , Alessandro De Maria , Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Mustafa Suleyman

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributed training of reinforcement learning systems. One of the methods includes receiving, by a learner, current values of the parameters of the Q network from a parameter server, wherein each learner maintains a respective learner Q network replica and a respective target Q network replica; updating, by the learner, the parameters of the learner Q network replica maintained by the learner using the current values; selecting, by the learner, an experience tuple from a respective replay memory; computing, by the learner, a gradient from the experience tuple using the learner Q network replica maintained by the learner and the target Q network replica maintained by the learner; and providing, by the learner, the computed gradient to the parameter server.

Patent Agency Ranking