-
公开(公告)号:US20170024643A1
公开(公告)日:2017-01-26
申请号:US15217758
申请日:2016-07-22
Applicant: Google Inc.
Inventor: Timothy Paul Lillicrap , Jonathan James Hunt , Alexander Pritzel , Nicolas Manfred Otto Heess , Tom Erez , Yuval Tassa , David Silver , Daniel Pieter Wierstra
IPC: G06N3/08
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an actor neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a minibatch of experience tuples; and updating current values of the parameters of the actor neural network, comprising: for each experience tuple in the minibatch: processing the training observation and the training action in the experience tuple using a critic neural network to determine a neural network output for the experience tuple, and determining a target neural network output for the experience tuple; updating current values of the parameters of the critic neural network using errors between the target neural network outputs and the neural network outputs; and updating the current values of the parameters of the actor neural network using the critic neural network.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于训练用于选择由与环境交互的代理执行的动作的动作者神经网络。 其中一种方法包括获取经验元组的小批量; 并且更新所述演员神经网络的参数的当前值,包括:对于所述迷你服务中的每个经验元组:使用批评神经网络来处理训练观察和经验元组中的训练动作以确定体验元组的神经网络输出 ,并且为所述体验元组确定目标神经网络输出; 使用目标神经网络输出与神经网络输出之间的误差来更新评价神经网络参数的当前值; 并使用批评神经网络更新演员神经网络的参数的当前值。