Patent search ap:("DEEPMIND TECHNOLOGIES LIMITED") AND inv:"Koray Kavukcuoglu" Page 2

11.

发明申请
REINFORCEMENT LEARNING WITH AUXILIARY TASKS 有权

公开(公告)号：US20210182688A1

公开(公告)日：2021-06-17

申请号：US17183618

申请日：2021-02-24

Applicant: DeepMind Technologies Limited

Inventor： Volodymyr Mnih , Wojciech Czarnecki , Maxwell Elliot Jaderberg , Tom Schaul , David Silver , Koray Kavukcuoglu

IPC: G06N3/08 , G06N20/00 , G06N3/04 , G06N3/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.

12.

发明授权
Reinforcement learning with auxiliary tasks 有权

公开(公告)号：US10956820B2

公开(公告)日：2021-03-23

申请号：US16403385

申请日：2019-05-03

Applicant: DeepMind Technologies Limited

Inventor： Volodymyr Mnih , Wojciech Czarnecki , Maxwell Elliot Jaderberg , Tom Schaul , David Silver , Koray Kavukcuoglu

IPC: G06N3/08 , G06N20/00 , G06N3/04 , G06N3/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.

13.

发明授权
Training machine learning models using task selection policies to increase learning progress 有权

公开(公告)号：US10936949B2

公开(公告)日：2021-03-02

申请号：US16508042

申请日：2019-07-10

Applicant: DeepMind Technologies Limited

Inventor： Marc Gendron-Bellemare , Jacob Lee Menick , Alexander Benjamin Graves , Koray Kavukcuoglu , Remi Munos

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model. In one aspect, a method includes receiving training data for training the machine learning model on a plurality of tasks, where each task includes multiple batches of training data. A task is selected in accordance with a current task selection policy. A batch of training data is selected from the selected task. The machine learning model is trained on the selected batch of training data to determine updated values of the model parameters. A learning progress measure that represents a progress of the training of the machine learning model as a result of training the machine learning model on the selected batch of training data is determined. The current task selection policy is updated using the learning progress measure.

14.

发明授权
Asynchronous deep reinforcement learning 有权

公开(公告)号：US10936946B2

公开(公告)日：2021-03-02

申请号：US15349950

申请日：2016-11-11

Applicant: DeepMind Technologies Limited

Inventor： Volodymyr Mnih , Adrià Puigdomènech Badia , Alexander Benjamin Graves , Timothy James Alexander Harley , David Silver , Koray Kavukcuoglu

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.

15.

发明申请
Using Hierarchical Representations for Neural Network Architecture Searching 审中-公开

公开(公告)号：US20200293899A1

公开(公告)日：2020-09-17

申请号：US16759567

申请日：2018-10-26

Applicant: DeepMind Technologies Limited

Inventor： Chrisantha Thomas Fernando , Karen Simonyan , Koray Kavukcuoglu , Hanxiao Liu , Oriol Vinyals

IPC: G06N3/08 , G06N3/04 , G06F16/901

Abstract: A computer-implemented method for automatically determining a neural network architecture represents a neural network architecture as a data structure defining a hierarchical set of directed acyclic graphs in multiple levels. Each graph has an input, an output, and a plurality of nodes between the input and the output. At each level, a corresponding set of the nodes are connected pairwise by directed edges which indicate operations performed on outputs of one node to generate an input to another node. Each level is associated with a corresponding set of operations. At a lowest level, the operations associated with each edge are selected from a set of primitive operations. The method includes repeatedly generating new sample neural network architectures, and evaluating their fitness. The modification is performed by selecting a level, selecting two nodes at that level, and modifying, removing or adding an edge between those nodes according to operations associated with lower levels of the hierarchy.

16.

发明申请
ASYNCHRONOUS DEEP REINFORCEMENT LEARNING 审中-公开

公开(公告)号：US20190258929A1

公开(公告)日：2019-08-22

申请号：US16403388

申请日：2019-05-03

Applicant: DeepMind Technologies Limited

Inventor： Volodymyr Mnih , Adria Puigdomenech Badia , Alexander Benjamin Graves , Timothy James Alexander Harley , David Silver , Koray Kavukcuoglu

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.

17.

发明授权
Spatial transformer modules 有权

公开(公告)号：US10032089B2

公开(公告)日：2018-07-24

申请号：US15174133

申请日：2016-06-06

Applicant: DeepMind Technologies Limited

Inventor： Maxwell Elliot Jaderberg , Karen Simonyan , Andrew Zisserman , Koray Kavukcuoglu

IPC: G06K9/46 , G06K9/36 , G06N3/02 , G06K9/52 , G06N3/04 , G06K9/03 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing inputs using an image processing neural network system that includes a spatial transformer module. One of the methods includes receiving an input feature map derived from the one or more input images, and applying a spatial transformation to the input feature map to generate a transformed feature map, comprising: processing the input feature map to generate spatial transformation parameters for the spatial transformation, and sampling from the input feature map in accordance with the spatial transformation parameters to generate the transformed feature map.

18.

发明授权
Distributed training using actor-critic reinforcement learning with off-policy correction factors 有权

公开(公告)号：US12299574B2

公开(公告)日：2025-05-13

申请号：US18487428

申请日：2023-10-16

Applicant: DeepMind Technologies Limited

Inventor： Hubert Josef Soyer , Lasse Espeholt , Karen Simonyan , Yotam Doron , Vlad Firoiu , Volodymyr Mnih , Koray Kavukcuoglu , Remi Munos , Thomas Ward , Timothy James Alexander Harley , Iain Robert Dunning

IPC: G06N3/08 , G06N3/045

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.

19.

发明公开
REINFORCEMENT LEARNING USING BASELINE AND POLICY NEURAL NETWORKS 审中-公开

公开(公告)号：US20240362481A1

公开(公告)日：2024-10-31

申请号：US18662481

申请日：2024-05-13

Applicant: DeepMind Technologies Limited

Inventor： Volodymyr Mnih , Adrià Puigdomènech Badia , Alexander Benjamin Graves , Timothy James Alexander Harley , David Silver , Koray Kavukcuoglu

IPC: G06N3/08 , G06N3/04 , G06N3/045

CPC classification number: G06N3/08 , G06N3/04 , G06N3/045

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.

20.

发明公开
DISTRIBUTED TRAINING USING ACTOR-CRITIC REINFORCEMENT LEARNING WITH OFF-POLICY CORRECTION FACTORS 审中-公开

公开(公告)号：US20240127060A1

公开(公告)日：2024-04-18

申请号：US18487428

申请日：2023-10-16

Applicant: DeepMind Technologies Limited

Inventor： Hubert Josef Soyer , Lasse Espeholt , Karen Simonyan , Yotam Doron , Vlad Firoiu , Volodymyr Mnih , Koray Kavukcuoglu , Remi Munos , Thomas Ward , Timothy James Alexander Harley , Iain Robert Dunning

IPC: G06N3/08 , G06N3/045

CPC classification number: G06N3/08 , G06N3/045

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification