Patent search ap:("DeepMind Technologies Limited") AND inv:"Iain Robert Dunning" Page 1

1.

发明公开
DEEP REINFORCEMENT LEARNING WITH FAST UPDATING RECURRENT NEURAL NETWORKS AND SLOW UPDATING RECURRENT NEURAL NETWORKS 审中-公开

公开(公告)号：US20240220774A1

公开(公告)日：2024-07-04

申请号：US18536065

申请日：2023-12-11

Applicant: DeepMind Technologies Limited

Inventor： Iain Robert Dunning , Wojciech Czarnecki , Maxwell Elliot Jaderberg

IPC: G06N3/045 , G06F17/18 , G06N3/047 , G06N3/08

CPC classification number: G06N3/045 , G06F17/18 , G06N3/047 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning. One of the methods includes selecting an action to be performed by the agent using both a slow updating recurrent neural network and a fast updating recurrent neural network that receives a fast updating input that includes the hidden state of the slow updating recurrent neural network.

2.

发明授权
Distributed training using actor-critic reinforcement learning with off-policy correction factors 有权

公开(公告)号：US11868894B2

公开(公告)日：2024-01-09

申请号：US18149771

申请日：2023-01-04

Applicant: DeepMind Technologies Limited

Inventor： Hubert Josef Soyer , Lasse Espeholt , Karen Simonyan , Yotam Doron , Vlad Firoiu , Volodymyr Mnih , Koray Kavukcuoglu , Remi Munos , Thomas Ward , Timothy James Alexander Harley , Iain Robert Dunning

IPC: G06N3/08 , G06N3/045

CPC classification number: G06N3/08 , G06N3/045

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.

3.

发明授权
Deep reinforcement learning with fast updating recurrent neural networks and slow updating recurrent neural networks 有权

公开(公告)号：US11842261B2

公开(公告)日：2023-12-12

申请号：US17121679

申请日：2020-12-14

Applicant: DeepMind Technologies Limited

Inventor： Iain Robert Dunning , Wojciech Czarnecki , Maxwell Elliot Jaderberg

IPC: G06N3/045 , G06N3/08 , G06N3/047 , G06F17/18

CPC classification number: G06N3/045 , G06F17/18 , G06N3/047 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning. One of the methods includes selecting an action to be performed by the agent using both a slow updating recurrent neural network and a fast updating recurrent neural network that receives a fast updating input that includes the hidden state of the slow updating recurrent neural network.

4.

发明授权
Distributed training using actor-critic reinforcement learning with off-policy correction factors 有权

公开(公告)号：US11593646B2

公开(公告)日：2023-02-28

申请号：US16767049

申请日：2019-02-05

Applicant: DeepMind Technologies Limited

Inventor： Hubert Josef Soyer , Lasse Espeholt , Karen Simonyan , Yotam Doron , Vlad Firoiu , Volodymyr Mnih , Koray Kavukcuoglu , Remi Munos , Thomas Ward , Timothy James Alexander Harley , Iain Robert Dunning

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.

5.

发明申请
DEEP REINFORCEMENT LEARNING WITH FAST UPDATING RECURRENT NEURAL NETWORKS AND SLOW UPDATING RECURRENT NEURAL NETWORKS 有权

公开(公告)号：US20210097373A1

公开(公告)日：2021-04-01

申请号：US17121679

申请日：2020-12-14

Applicant: DeepMind Technologies Limited

Inventor： Iain Robert Dunning , Wojciech Czarnecki , Maxwell Elliot Jaderberg

IPC: G06N3/04 , G06F17/18 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning. One of the methods includes selecting an action to be performed by the agent using both a slow updating recurrent neural network and a fast updating recurrent neural network that receives a fast updating input that includes the hidden state of the slow updating recurrent neural network.

6.

发明授权
Distributed training using actor-critic reinforcement learning with off-policy correction factors 有权

公开(公告)号：US12299574B2

公开(公告)日：2025-05-13

申请号：US18487428

申请日：2023-10-16

Applicant: DeepMind Technologies Limited

Inventor： Hubert Josef Soyer , Lasse Espeholt , Karen Simonyan , Yotam Doron , Vlad Firoiu , Volodymyr Mnih , Koray Kavukcuoglu , Remi Munos , Thomas Ward , Timothy James Alexander Harley , Iain Robert Dunning

IPC: G06N3/08 , G06N3/045

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.

7.

发明公开
DISTRIBUTED TRAINING USING ACTOR-CRITIC REINFORCEMENT LEARNING WITH OFF-POLICY CORRECTION FACTORS 审中-公开

公开(公告)号：US20240127060A1

公开(公告)日：2024-04-18

申请号：US18487428

申请日：2023-10-16

Applicant: DeepMind Technologies Limited

Inventor： Hubert Josef Soyer , Lasse Espeholt , Karen Simonyan , Yotam Doron , Vlad Firoiu , Volodymyr Mnih , Koray Kavukcuoglu , Remi Munos , Thomas Ward , Timothy James Alexander Harley , Iain Robert Dunning

IPC: G06N3/08 , G06N3/045

CPC classification number: G06N3/08 , G06N3/045

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.

8.

发明公开
DISTRIBUTED TRAINING USING ACTOR-CRITIC REINFORCEMENT LEARNING WITH OFF-POLICY CORRECTION FACTORS 审中-公开

公开(公告)号：US20230153617A1

公开(公告)日：2023-05-18

申请号：US18149771

申请日：2023-01-04

Applicant: DeepMind Technologies Limited

Inventor： Hubert Josef Soyer , Lasse Espeholt , Karen Simonyan , Yotam Doron , Vlad Firoiu , Volodymyr Mnih , Koray Kavukcuoglu , Remi Munos , Thomas Ward , Timothy James Alexander Harley , Iain Robert Dunning

IPC: G06N3/08 , G06N3/045

CPC classification number: G06N3/08 , G06N3/045

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.

9.

发明申请
DISTRIBUTED TRAINING USING ACTOR-CRITIC REINFORCEMENT LEARNING WITH OFF-POLICY CORRECTION FACTORS 有权

公开(公告)号：US20210034970A1

公开(公告)日：2021-02-04

申请号：US16767049

申请日：2019-02-05

Applicant: DeepMind Technologies Limited

Inventor： Hubert Josef Soyer , Lasse Espeholt , Karen Simonyan , Yotam Doron , Vlad Firoiu , Volodymyr Mnih , Koray Kavukcuoglu , Remi Munos , Thomas Ward , Timothy James Alexander Harley , Iain Robert Dunning

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification