Continuous control with deep reinforcement learning

Invention Grant

US10776692B2 Continuous control with deep reinforcement learning 有权

Please log in to see more content

Patent Title: Continuous control with deep reinforcement learning
Application No.: US15217758

Application Date: 2016-07-22
Publication No.: US10776692B2

Publication Date: 2020-09-15
Inventor: Timothy Paul Lillicrap , Jonathan James Hunt , Alexander Pritzel , Nicolas Manfred Otto Heess , Tom Erez , Yuval Tassa , David Silver , Daniel Pieter Wierstra
Applicant: DeepMind Technologies Limited
Applicant Address: GB London
Assignee: DeepMind Technologies Limited
Current Assignee: DeepMind Technologies Limited
Current Assignee Address: GB London
Agency: Fish & Richardson P.C.
Main IPC: G06N3/08
IPC: G06N3/08 ; G06N3/00 ; G06N3/04

Continuous control with deep reinforcement learning

Abstract:

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an actor neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a minibatch of experience tuples; and updating current values of the parameters of the actor neural network, comprising: for each experience tuple in the minibatch: processing the training observation and the training action in the experience tuple using a critic neural network to determine a neural network output for the experience tuple, and determining a target neural network output for the experience tuple; updating current values of the parameters of the critic neural network using errors between the target neural network outputs and the neural network outputs; and updating the current values of the parameters of the actor neural network using the critic neural network.

Public/Granted literature

US20170024643A1 CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING Public/Granted day:2017-01-26

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N3/00	基于生物学模型的计算机系统
G06N3/02	.采用神经网络模型
G06N3/08	..学习方法