Training action selection neural networks using apprenticeship

Invention Grant

US11468321B2 Training action selection neural networks using apprenticeship 有权

Please log in to see more content

Patent Title: Training action selection neural networks using apprenticeship
Application No.: US16624245

Application Date: 2018-06-28
Publication No.: US11468321B2

Publication Date: 2022-10-11
Inventor: Olivier Claude Pietquin , Martin Riedmiller , Wang Fumin , Bilal Piot , Mel Vecerik , Todd Andrew Hester , Thomas Rothoerl , Thomas Lampe , Nicolas Manfred Otto Heess , Jonathan Karl Scholz
Applicant: DEEPMIND TECHNOLOGIES LIMITED
Applicant Address: GB London
Assignee: DEEPMIND TECHNOLOGIES LIMITED
Current Assignee: DEEPMIND TECHNOLOGIES LIMITED
Current Assignee Address: GB London
Agency: Fish & Richardson P.C.
International Application: PCT/EP2018/067414 WO 20180628
International Announcement: WO2019/002465 WO 20190103
Main IPC: G06N3/02
IPC: G06N3/02 ; G06N3/08 ; G06N3/04

Training action selection neural networks using apprenticeship

Abstract:

An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.

Public/Granted literature

US11868882B2 Training action selection neural networks using apprenticeship Public/Granted day:2024-01-09

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N3/00	基于生物学模型的计算机系统
G06N3/02	.采用神经网络模型