Patent search ap:("DeepMind Technologies Limited") AND inv:"Volodymyr Mnih" Page 3

21.

发明授权
Asynchronous deep reinforcement learning 有权

公开(公告)号：US10346741B2

公开(公告)日：2019-07-09

申请号：US15977923

申请日：2018-05-11

Applicant: DeepMind Technologies Limited

Inventor： Volodymyr Mnih , Adrià Puigdomènech Badia , Alexander Benjamin Graves , Timothy James Alexander Harley , David Silver , Koray Kavukcuoglu

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.

22.

发明授权
Reinforcement learning for active sequence processing 有权

公开(公告)号：US12175737B2

公开(公告)日：2024-12-24

申请号：US17773789

申请日：2020-11-13

Applicant: DEEPMIND TECHNOLOGIES LIMITED

Inventor： Viorica Patraucean , Bilal Piot , Joao Carreira , Volodymyr Mnih , Simon Osindero

IPC: G06V10/82 , G06N3/045 , G06N3/048

Abstract: A system that is configured to receive a sequence of task inputs and to perform a machine learning task is described. The system includes a reinforcement learning (RL) neural network and a task neural network. The RL neural network is configured to: generate, for each task input of the sequence of task inputs, a respective decision that determines whether to encode the task input or to skip the task input, and provide the respective decision of each task input to the task neural network. The task neural network is configured to: receive the sequence of task inputs, receive, from the RL neural network, for each task input of the sequence of task inputs, a respective decision that determines whether to encode the task input or to skip the task input, process each of the un-skipped task inputs in the sequence of task inputs to generate a respective accumulated feature for the un-skipped task input, wherein the respective accumulated feature characterizes features of the un-skipped task input and of previous un-skipped task inputs in the sequence, and generate a machine learning task output for the machine learning task based on the last accumulated feature generated for the last un-skipped task input in the sequence.

23.

发明公开
NEURAL NETWORK REINFORCEMENT LEARNING WITH DIVERSE POLICIES 审中-公开

公开(公告)号：US20240104389A1

公开(公告)日：2024-03-28

申请号：US18275511

申请日：2022-02-04

Applicant: DeepMind Technologies Limited

Inventor： Tom Ben Zion Zahavy , Brendan Timothy O'Donoghue , Andre da Motta Salles Barreto , Johan Sebastian Flennerhag , Volodymyr Mnih , Satinder Singh Baveja

IPC: G06N3/092

CPC classification number: G06N3/092

Abstract: In one aspect there is provided a method for training a neural network system by reinforcement learning. The neural network system may be configured to receive an input observation characterizing a state of an environment interacted with by an agent and to select and output an action in accordance with a policy aiming to satisfy an objective. The method may comprise obtaining a policy set comprising one or more policies for satisfying the objective and determining a new policy based on the one or more policies. The determining may include one or more optimization steps that aim to maximize a diversity of the new policy relative to the policy set under the condition that the new policy satisfies a minimum performance criterion based on an expected return that would be obtained by following the new policy.

24.

发明授权
Image processing of an environment to select an action to be performed by an agent interacting with the environment 有权

公开(公告)号：US11941088B1

公开(公告)日：2024-03-26

申请号：US17737544

申请日：2022-05-05

Applicant: DeepMind Technologies Limited

Inventor： Volodymyr Mnih , Koray Kavukcuoglu

IPC: G06V10/44 , G06F18/2431 , G06V20/80 , G06V30/194 , G06V30/413

CPC classification number: G06F18/2431 , G06V10/44 , G06V20/80 , G06V30/194 , G06V30/413

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using recurrent attention. One of the methods includes determining a location in the first image; extracting a glimpse from the first image using the location; generating a glimpse representation of the extracted glimpse; processing the glimpse representation using a recurrent neural network to update a current internal state of the recurrent neural network to generate a new internal state; processing the new internal state to select a location in a next image in the image sequence after the first image; and processing the new internal state to select an action from a predetermined set of possible actions.

25.

发明授权
Controlling agents using amortized Q learning 有权

公开(公告)号：US11868866B2

公开(公告)日：2024-01-09

申请号：US17287306

申请日：2019-11-18

Applicant: DeepMind Technologies Limited

Inventor： Tom Van de Wiele , Volodymyr Mnih , Andriy Mnih , David Constantine Patrick Warde-Farley

IPC: G06N3/04 , G06N3/047 , G06N3/006 , G06N3/084

CPC classification number: G06N3/047 , G06N3/006 , G06N3/084

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment. One of the methods includes receiving a current observation; processing the current observation using a proposal neural network to generate a proposal output that defines a proposal probability distribution over a set of possible actions that can be performed by the agent to interact with the environment; sampling (i) one or more actions from the set of possible actions in accordance with the proposal probability distribution and (ii) one or more actions randomly from the set of possible actions; processing the current observation and each sampled action using a Q neural network to generate a Q value; and selecting an action using the Q values generated by the Q neural network.

26.

发明授权
Asynchronous deep reinforcement learning 有权

公开(公告)号：US11783182B2

公开(公告)日：2023-10-10

申请号：US17170316

申请日：2021-02-08

Applicant: DeepMind Technologies Limited

Inventor： Volodymyr Mnih , Adrià Puigdomènech Badia , Alexander Benjamin Graves , Timothy James Alexander Harley , David Silver , Koray Kavukcuoglu

IPC: G06N3/08 , G06N3/045 , G06N3/04

CPC classification number: G06N3/08 , G06N3/04 , G06N3/045

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.

27.

发明申请
REINFORCEMENT LEARNING USING BASELINE AND POLICY NEURAL NETWORKS 有权

公开(公告)号：US20220261647A1

公开(公告)日：2022-08-18

申请号：US17733594

申请日：2022-04-29

Applicant: DeepMind Technologies Limited

Inventor： Volodymyr Mnih , Adrià Puigdomènech Badia , Alexander Benjamin Graves , Timothy James Alexander Harley , David Silver , Koray Kavukcuoglu

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.

28.

发明授权
Image processing with recurrent attention 有权

公开(公告)号：US11354548B1

公开(公告)日：2022-06-07

申请号：US16927159

申请日：2020-07-13

Applicant: DeepMind Technologies Limited

Inventor： Volodymyr Mnih , Koray Kavukcuoglu

IPC: G06K9/62 , G06V10/44 , G06V20/80 , G06V30/194 , G06V30/413

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using recurrent attention. One of the methods includes determining a location in the first image; extracting a glimpse from the first image using the location; generating a glimpse representation of the extracted glimpse; processing the glimpse representation using a recurrent neural network to update a current internal state of the recurrent neural network to generate a new internal state; processing the new internal state to select a location in a next image in the image sequence after the first image; and processing the new internal state to select an action from a predetermined set of possible actions.

29.

发明授权
Asynchronous deep reinforcement learning 有权

公开(公告)号：US11334792B2

公开(公告)日：2022-05-17

申请号：US16403388

申请日：2019-05-03

Applicant: DeepMind Technologies Limited

Inventor： Volodymyr Mnih , Adria Puigdomenech Badia , Alexander Benjamin Graves , Timothy James Alexander Harley , David Silver , Koray Kavukcuoglu

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.

30.

发明申请
UNSUPERVISED CONTROL USING LEARNED REWARDS 审中-公开

公开(公告)号：US20190354869A1

公开(公告)日：2019-11-21

申请号：US16416920

申请日：2019-05-20

Applicant: DeepMind Technologies Limited

Inventor： David Constantine Patrick Warde-Farley , Volodymyr Mnih

IPC: G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent that interacts with an environment. In one aspect, a system comprises: an action selection subsystem that selects actions to be performed by the agent using an action selection policy generated using an action selection neural network; a reward subsystem that is configured to: receive an observation characterizing a current state of the environment and an observation characterizing a goal state of the environment; generate a reward using an embedded representation of the observation characterizing the current state of the environment and an embedded representation of the observation characterizing the goal state of the environment; and a training subsystem that is configured to train the action selection neural network based on the rewards generated by the reward subsystem using reinforcement learning techniques.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification