DISTRIBUTED TRAINING USING ACTOR-CRITIC REINFORCEMENT LEARNING WITH OFF-POLICY CORRECTION FACTORS

Invention Publication

US20230153617A1 DISTRIBUTED TRAINING USING ACTOR-CRITIC REINFORCEMENT LEARNING WITH OFF-POLICY CORRECTION FACTORS 审中-公开

Please log in to see more content

Patent Title: DISTRIBUTED TRAINING USING ACTOR-CRITIC REINFORCEMENT LEARNING WITH OFF-POLICY CORRECTION FACTORS
Application No.: US18149771

Application Date: 2023-01-04
Publication No.: US20230153617A1

Publication Date: 2023-05-18
Inventor: Hubert Josef Soyer , Lasse Espeholt , Karen Simonyan , Yotam Doron , Vlad Firoiu , Volodymyr Mnih , Koray Kavukcuoglu , Remi Munos , Thomas Ward , Timothy James Alexander Harley , Iain Robert Dunning
Applicant: DeepMind Technologies Limited
Applicant Address: GB London
Assignee: DeepMind Technologies Limited
Current Assignee: DeepMind Technologies Limited
Current Assignee Address: GB London
Main IPC: G06N3/08
IPC: G06N3/08 ; G06N3/045

DISTRIBUTED TRAINING USING ACTOR-CRITIC REINFORCEMENT LEARNING WITH OFF-POLICY CORRECTION FACTORS

Abstract:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.

Public/Granted literature

US11868894B2 Distributed training using actor-critic reinforcement learning with off-policy correction factors Public/Granted day:2024-01-09

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N3/00	基于生物学模型的计算机系统
G06N3/02	.采用神经网络模型
G06N3/08	..学习方法