Controlling agents over long time scales using temporal value transport

Invention Grant

US10789511B2 Controlling agents over long time scales using temporal value transport 有权

Please log in to see more content

Patent Title: Controlling agents over long time scales using temporal value transport
Application No.: US16601324

Application Date: 2019-10-14
Publication No.: US10789511B2

Publication Date: 2020-09-29
Inventor: Gregory Duncan Wayne , Timothy Paul Lillicrap , Chia-Chun Hung , Joshua Simon Abramson
Applicant: DeepMind Technologies Limited
Applicant Address: GB London
Assignee: DeepMind Technologies Limited
Current Assignee: DeepMind Technologies Limited
Current Assignee Address: GB London
Agency: Fish & Richardson P.C.
Main IPC: G06K9/62
IPC: G06K9/62 ; G06F11/30 ; G06N3/08

Controlling agents over long time scales using temporal value transport

Abstract:

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment to perform a specified task. One of the methods includes causing the agent to perform a task episode in which the agent attempts to perform the specified task; for each of one or more particular time steps in the sequence: generating a modified reward for the particular time step from (i) the actual reward at the time step and (ii) value predictions at one or more time steps that are more than a threshold number of time steps after the particular time step in the sequence; and training, through reinforcement learning, the neural network system using at least the modified rewards for the particular time steps.

Public/Granted literature

US20200117956A1 CONTROLLING AGENTS OVER LONG TIME SCALES USING TEMPORAL VALUE TRANSPORT Public/Granted day:2020-04-16

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )
G06K9/62	.应用电子设备进行识别的方法或装置