Invention Grant
- Patent Title: Training action selection neural networks using apprenticeship
-
Application No.: US17962008Application Date: 2022-10-07
-
Publication No.: US11886997B2Publication Date: 2024-01-30
- Inventor: Olivier Pietquin , Martin Riedmiller , Wang Fumin , Bilal Piot , Mel Vecerik , Todd Andrew Hester , Thomas Rothoerl , Thomas Lampe , Nicolas Manfred Otto Heess , Jonathan Karl Scholz
- Applicant: DeepMind Technologies Limited
- Applicant Address: GB London
- Assignee: DeepMind Technologies Limited
- Current Assignee: DeepMind Technologies Limited
- Current Assignee Address: GB London
- Agency: Fish & Richardson P.C.
- Main IPC: G06N3/08
- IPC: G06N3/08 ; G06N3/045 ; G06N3/047

Abstract:
An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.
Public/Granted literature
- US20230023189A1 TRAINING ACTION SELECTION NEURAL NETWORKS USING APPRENTICESHIP Public/Granted day:2023-01-26
Information query