-
公开(公告)号:US20170278018A1
公开(公告)日:2017-09-28
申请号:US15619393
申请日:2017-06-09
Applicant: Google Inc.
Inventor: Volodymyr Mnih , Koray Kavukcuoglu
CPC classification number: G06N20/00 , A63F13/67 , G06N3/0454 , G06N3/08
Abstract: We describe a method of reinforcement learning for a subject system having multiple states and actions to move from one state to the next. Training data is generated by operating on the system with a succession of actions and used to train a second neural network. Target values for training the second neural network are derived from a first neural network which is generated by copying weights of the second neural network at intervals.
-
公开(公告)号:US20170140270A1
公开(公告)日:2017-05-18
申请号:US15349950
申请日:2016-11-11
Applicant: Google Inc.
Inventor: Volodymyr Mnih , Adrià Puigdomènech Badia , Alexander Benjamin Graves , Timothy James Alexander Harley , David Silver , Koray Kavukcuoglu
CPC classification number: G06N3/08 , G06N3/04 , G06N3/0454
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.
-
3.
公开(公告)号:US20160232445A1
公开(公告)日:2016-08-11
申请号:US15016173
申请日:2016-02-04
Applicant: Google Inc.
Inventor: Praveen Deepak Srinivasan , Rory Fearon , Cagdas Alcicek , Arun Sarath Nair , Samuel Blackwell , Vedavyas Panneershelvam , Alessandro De Maria , Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Mustafa Suleyman
CPC classification number: G06N3/08 , G06N3/0454 , G06N3/0472
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributed training of reinforcement learning systems. One of the methods includes receiving, by a learner, current values of the parameters of the Q network from a parameter server, wherein each learner maintains a respective learner Q network replica and a respective target Q network replica; updating, by the learner, the parameters of the learner Q network replica maintained by the learner using the current values; selecting, by the learner, an experience tuple from a respective replay memory; computing, by the learner, a gradient from the experience tuple using the learner Q network replica maintained by the learner and the target Q network replica maintained by the learner; and providing, by the learner, the computed gradient to the parameter server.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于强化学习系统的分布式训练。 其中一种方法包括从学习者接收来自参数服务器的Q网络参数的当前值,其中每个学习者维护相应的学习者Q网络副本和相应的目标Q网络副本; 由学习者更新由学习者使用当前值维护的学习者Q网络副本的参数; 由学习者选择来自相应回放记忆的经验元组; 由学习者使用由学习者维护的学习者Q网络副本和学习者维护的目标Q网络副本的经验元组进行计算; 并且由学习者将计算的梯度提供给参数服务器。
-
公开(公告)号:US20170228871A1
公开(公告)日:2017-08-10
申请号:US15497378
申请日:2017-04-26
Applicant: Google Inc.
Inventor: Volodymyr Mnih , Geoffrey E. Hinton
CPC classification number: G06T7/11 , G06K9/0063 , G06K9/00651 , G06K9/6269 , G06K9/6277 , G06N3/084 , G06T7/143 , G06T11/60 , G06T2207/10032 , G06T2207/20084 , G06T2207/30181
Abstract: A system and method for labelling aerial images. A neural network generates predicted map data. The parameters of the neural network are trained by optimizing an objective function which compensates for noise in the map images. The function compensates both omission noise and registration noise.
-
公开(公告)号:US09679258B2
公开(公告)日:2017-06-13
申请号:US14097862
申请日:2013-12-05
Applicant: Google Inc.
Inventor: Volodymyr Mnih , Koray Kavukcuoglu
CPC classification number: G06N99/005 , A63F13/67 , G06N3/0454 , G06N3/08
Abstract: We describe a method of reinforcement learning for a subject system having multiple states and actions to move from one state to the next. Training data is generated by operating on the system with a succession of actions and used to train a second neural network. Target values for training the second neural network are derived from a first neural network which is generated by copying weights of the second neural network at intervals.
-
公开(公告)号:US20130343641A1
公开(公告)日:2013-12-26
申请号:US13924320
申请日:2013-06-21
Applicant: Google Inc.
Inventor: Volodymyr Mnih , Geoffrey E. Hinton
IPC: G06K9/62
CPC classification number: G06T7/11 , G06K9/0063 , G06K9/00651 , G06K9/6269 , G06K9/6277 , G06N3/084 , G06T7/143 , G06T11/60 , G06T2207/10032 , G06T2207/20084 , G06T2207/30181
Abstract: A system and method for labelling aerial images. A neural network generates predicted map data. The parameters of the neural network are trained by optimizing an objective function which compensates for noise in the map images. The function compensates both omission noise and registration noise.
Abstract translation: 一种用于标记航空图像的系统和方法。 神经网络生成预测的地图数据。 通过优化补偿地图图像中的噪声的目标函数来训练神经网络的参数。 该功能补偿了省略噪声和注册噪声。
-
公开(公告)号:US09704068B2
公开(公告)日:2017-07-11
申请号:US13924320
申请日:2013-06-21
Applicant: Google Inc.
Inventor: Volodymyr Mnih , Geoffrey E. Hinton
CPC classification number: G06T7/11 , G06K9/0063 , G06K9/00651 , G06K9/6269 , G06K9/6277 , G06N3/084 , G06T7/143 , G06T11/60 , G06T2207/10032 , G06T2207/20084 , G06T2207/30181
Abstract: A system and method for labelling aerial images. A neural network generates predicted map data. The parameters of the neural network are trained by optimizing an objective function which compensates for noise in the map images. The function compensates both omission noise and registration noise.
-
-
-
-
-
-