Patent search ap:("Google Inc.") AND inv:"Cagdas Alcicek" Page 1

1.

发明申请
DISTRIBUTED TRAINING OF REINFORCEMENT LEARNING SYSTEMS 审中-公开
Title translation: 加强学习系统的分布式培训

公开(公告)号：US20160232445A1

公开(公告)日：2016-08-11

申请号：US15016173

申请日：2016-02-04

Applicant: Google Inc.

Inventor： Praveen Deepak Srinivasan , Rory Fearon , Cagdas Alcicek , Arun Sarath Nair , Samuel Blackwell , Vedavyas Panneershelvam , Alessandro De Maria , Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Mustafa Suleyman

IPC: G06N3/08 , G06N3/04

CPC classification number: G06N3/08 , G06N3/0454 , G06N3/0472

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributed training of reinforcement learning systems. One of the methods includes receiving, by a learner, current values of the parameters of the Q network from a parameter server, wherein each learner maintains a respective learner Q network replica and a respective target Q network replica; updating, by the learner, the parameters of the learner Q network replica maintained by the learner using the current values; selecting, by the learner, an experience tuple from a respective replay memory; computing, by the learner, a gradient from the experience tuple using the learner Q network replica maintained by the learner and the target Q network replica maintained by the learner; and providing, by the learner, the computed gradient to the parameter server.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于强化学习系统的分布式训练。其中一种方法包括从学习者接收来自参数服务器的Q网络参数的当前值，其中每个学习者维护相应的学习者Q网络副本和相应的目标Q网络副本; 由学习者更新由学习者使用当前值维护的学习者Q网络副本的参数; 由学习者选择来自相应回放记忆的经验元组; 由学习者使用由学习者维护的学习者Q网络副本和学习者维护的目标Q网络副本的经验元组进行计算; 并且由学习者将计算的梯度提供给参数服务器。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification