TRAINING ACTION SELECTION NEURAL NETWORKS USING OFF-POLICY ACTOR CRITIC REINFORCEMENT LEARNING AND STOCHASTIC DUELING NEURAL NETWORKS

Invention Application

US20250094772A1 TRAINING ACTION SELECTION NEURAL NETWORKS USING OFF-POLICY ACTOR CRITIC REINFORCEMENT LEARNING AND STOCHASTIC DUELING NEURAL NETWORKS 有权

Please log in to see more content

Patent Title: TRAINING ACTION SELECTION NEURAL NETWORKS USING OFF-POLICY ACTOR CRITIC REINFORCEMENT LEARNING AND STOCHASTIC DUELING NEURAL NETWORKS
Application No.: US18962266

Application Date: 2024-11-27
Publication No.: US20250094772A1

Publication Date: 2025-03-20
Inventor: Ziyu Wang , Nicolas Manfred Otto Heess , Victor Constant Bapst
Applicant: DeepMind Technologies Limited
Applicant Address: GB London
Assignee: DeepMind Technologies Limited
Current Assignee: DeepMind Technologies Limited
Current Assignee Address: GB London
Main IPC: G06N3/045
IPC: G06N3/045 ; G06N3/006 ; G06N3/047 ; G06N3/084 ; G06N3/088

TRAINING ACTION SELECTION NEURAL NETWORKS USING OFF-POLICY ACTOR CRITIC REINFORCEMENT LEARNING AND STOCHASTIC DUELING NEURAL NETWORKS

Abstract:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.

Information query

Global Dossier Espacenet