Action selection neural network training using imitation learning in latent space

Invention Grant

US11663441B2 Action selection neural network training using imitation learning in latent space 有权

Please log in to see more content

Patent Title: Action selection neural network training using imitation learning in latent space
Application No.: US16586437

Application Date: 2019-09-27
Publication No.: US11663441B2

Publication Date: 2023-05-30
Inventor: Scott Ellison Reed , Yusuf Aytar , Ziyu Wang , Tom Paine , Sergio Gomez Colmenarejo , David Budden , Tobias Pfaff , Aaron Gerard Antonius van den Oord , Oriol Vinyals , Alexander Novikov
Applicant: DeepMind Technologies Limited
Applicant Address: GB London
Assignee: DeepMind Technologies Limited
Current Assignee: DeepMind Technologies Limited
Current Assignee Address: GB London
Agency: Fish & Richardson P.C.
Main IPC: G06N3/006
IPC: G06N3/006 ; G06F17/16 ; G06N3/08 ; G06F18/22 ; G06N3/045 ; G06N3/048 ; G06V10/764 ; G06V10/77 ; G06V10/82

Action selection neural network training using imitation learning in latent space

Abstract:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection policy neural network, wherein the action selection policy neural network is configured to process an observation characterizing a state of an environment to generate an action selection policy output, wherein the action selection policy output is used to select an action to be performed by an agent interacting with an environment. In one aspect, a method comprises: obtaining an observation characterizing a state of the environment subsequent to the agent performing a selected action; generating a latent representation of the observation; processing the latent representation of the observation using a discriminator neural network to generate an imitation score; determining a reward from the imitation score; and adjusting the current values of the action selection policy neural network parameters based on the reward using a reinforcement learning training technique.

Public/Granted literature

US2204021A Concrete forming machine Public/Granted day:1940-06-11

Information query

Espacenet