Patent search ap:("DeepMind Technologies Limited") AND inv:"Wojciech Czarnecki" Page 3

21.

发明授权
Multi-task neural network systems with task-specific policies and a shared policy 有权

公开(公告)号：US11132609B2

公开(公告)日：2021-09-28

申请号：US16689020

申请日：2019-11-19

Applicant: DeepMind Technologies Limited

Inventor： Razvan Pascanu , Raia Thais Hadsell , Victor Constant Bapst , Wojciech Czarnecki , James Kirkpatrick , Yee Whye Teh , Nicolas Manfred Otto Heess

IPC: G06N3/08 , G06N3/10 , G06N5/04

Abstract: A method is proposed for training a multitask computer system, such as a multitask neural network system. The system comprises a set of trainable workers and a shared module. The trainable workers and shared module are trained on a plurality of different tasks, such that each worker learns to perform a corresponding one of the tasks according to a respective task policy, and said shared policy network learns a multitask policy which represents common behavior for the tasks. The coordinated training is performed by optimizing an objective function comprising, for each task: a reward term indicative of an expected reward earned by a worker in performing the corresponding task according to the task policy; and at least one entropy term which regularizes the distribution of the task policy towards the distribution of the multitask policy.

22.

发明申请
REINFORCEMENT LEARNING WITH AUXILIARY TASKS 有权

公开(公告)号：US20210182688A1

公开(公告)日：2021-06-17

申请号：US17183618

申请日：2021-02-24

Applicant: DeepMind Technologies Limited

Inventor： Volodymyr Mnih , Wojciech Czarnecki , Maxwell Elliot Jaderberg , Tom Schaul , David Silver , Koray Kavukcuoglu

IPC: G06N3/08 , G06N20/00 , G06N3/04 , G06N3/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.

23.

发明申请
NEURAL NETWORKS FOR SCALABLE CONTINUAL LEARNING IN DOMAINS WITH SEQUENTIALLY LEARNED TASKS 有权

公开(公告)号：US20210117786A1

公开(公告)日：2021-04-22

申请号：US17048023

申请日：2019-04-18

Applicant: DEEPMIND TECHNOLOGIES LIMITED

Inventor： Jonathan Schwarz , Razvan Pascanu , Raia Thais Hadsell , Wojciech Czarnecki , Yee Whye Teh , Jelena Luketina

IPC: G06N3/08 , G06N5/02 , G06N20/20 , G06K9/62

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for scalable continual learning using neural networks. One of the methods includes receiving new training data for a new machine learning task; training an active subnetwork on the new training data to determine trained values of the active network parameters from initial values of the active network parameters while holding current values of the knowledge parameters fixed; and training a knowledge subnetwork on the new training data to determine updated values of the knowledge parameters from the current values of the knowledge parameters by training the knowledge subnetwork to generate knowledge outputs for the new training inputs that match active outputs generated by the trained active subnetwork for the new training inputs.

24.

发明授权
Reinforcement learning with auxiliary tasks 有权

公开(公告)号：US10956820B2

公开(公告)日：2021-03-23

申请号：US16403385

申请日：2019-05-03

Applicant: DeepMind Technologies Limited

Inventor： Volodymyr Mnih , Wojciech Czarnecki , Maxwell Elliot Jaderberg , Tom Schaul , David Silver , Koray Kavukcuoglu

IPC: G06N3/08 , G06N20/00 , G06N3/04 , G06N3/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification