Patent search ap:("DeepMind Technologies Limited") AND inv:"Simon Osindero" Page 2

11.

发明申请
ACTION SELECTION FOR REINFORCEMENT LEARNING USING A MANAGER NEURAL NETWORK THAT GENERATES GOAL VECTORS DEFINING AGENT OBJECTIVES 有权

公开(公告)号：US20230090824A1

公开(公告)日：2023-03-23

申请号：US18072175

申请日：2022-11-30

Applicant: DeepMind Technologies Limited

Inventor： Simon Osindero , Koray Kavukcuoglu , Alexander Vezhnevets

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a system configured to select actions to be performed by an agent that interacts with an environment. The system comprises a manager neural network subsystem and a worker neural network subsystem. The manager subsystem is configured to, at each of the multiple time steps, generate a final goal vector for the time step. The worker subsystem is configured to, at each of multiple time steps, use the final goal vector generated by the manager subsystem to generate a respective action score for each action in a predetermined set of actions.

12.

发明授权
Action selection for reinforcement learning using neural networks 有权

公开(公告)号：US10679126B2

公开(公告)日：2020-06-09

申请号：US16511571

申请日：2019-07-15

Applicant: DeepMind Technologies Limited

Inventor： Simon Osindero , Koray Kavukcuoglu , Alexander Vezhnevets

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a system configured to select actions to be performed by an agent that interacts with an environment. The system comprises a manager neural network subsystem and a worker neural network subsystem. The manager subsystem is configured to, at each of the multiple time steps, generate a final goal vector for the time step. The worker subsystem is configured to, at each of multiple time steps, use the final goal vector generated by the manager subsystem to generate a respective action score for each action in a predetermined set of actions.

13.

发明申请
REINFORCEMENT LEARNING FOR ACTIVE SEQUENCE PROCESSING 有权

公开(公告)号：US20250148774A1

公开(公告)日：2025-05-08

申请号：US18953004

申请日：2024-11-19

Applicant: DeepMind Technologies Limited

Inventor： Viorica Patraucean , Bilal Piot , Joao Carreira , Volodymyr Mnih , Simon Osindero

IPC: G06V10/82 , G06N3/045 , G06N3/048

Abstract: A system that is configured to receive a sequence of task inputs and to perform a machine learning task is described. An RL neural network is configured to: generate, for each task input of the sequence, a respective decision that determines whether to encode the task input or to skip the task input, and provide the respective decision of each task input to the task neural network. The task neural network is configured to: receive the sequence of task inputs, receive, from the RL neural network, for each task input of the sequence, a respective decision, process each of the un-skipped task inputs in the sequence of task inputs to generate a respective accumulated feature for the un-skipped task input, and generate a machine learning task output for the machine learning task based on the last accumulated feature generated for the last un-skipped task input in the sequence.

14.

发明授权
Modulating agent behavior to optimize learning progress 有权

公开(公告)号：US12061964B2

公开(公告)日：2024-08-13

申请号：US17032562

申请日：2020-09-25

Applicant: DeepMind Technologies Limited

Inventor： Tom Schaul , Diana Luiza Borsa , Fengning Ding , David Szepesvari , Georg Ostrovski , Simon Osindero , William Clinton Dabney

IPC: G06N3/006 , G06F18/214 , G06F18/2415 , G06N3/08 , G06V10/764 , G06V10/82 , G06V40/20

CPC classification number: G06N3/006 , G06F18/2148 , G06F18/2415 , G06N3/08 , G06V10/764 , G06V10/82 , G06V40/20

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent. One of the methods includes sampling a behavior modulation in accordance with a current probability distribution; for each of one or more time steps: processing an input comprising an observation characterizing a current state of the environment at the time step using an action selection neural network to generate a respective action score for each action in a set of possible actions that can be performed by the agent; modifying the action scores using the sampled behavior modulation; and selecting the action to be performed by the agent at the time step based on the modified action scores; determining a fitness measure corresponding to the sampled behavior modulation; and updating the current probability distribution over the set of possible behavior modulations using the fitness measure corresponding to the behavior modulation.

15.

发明申请
SYSTEM AND METHOD FOR TRAINING A SPARSE NEURAL NETWORK WHILST MAINTAINING SPARSITY 有权

公开(公告)号：US20230124177A1

公开(公告)日：2023-04-20

申请号：US17914035

申请日：2021-06-04

Applicant: DeepMind Technologies Limited

Inventor： Siddhant Madhu Jayakumar , Razvan Pascanu , Jack William Rae , Simon Osindero , Erich Konrad Elsen

IPC: G06N3/08 , G06F18/211

Abstract: A computer-implemented method of training a neural network. The method comprises repeatedly determining a forward-pass set of network parameters by selecting a first sub-set of parameters of the neural network and setting all other parameters of the forward-pass set of network parameters to zero. The method then processes a training data item using the neural network in accordance with the forward-pass set of network parameters to generate a neural network output, determines a value of an objective function from the neural network output and the training data item, selects a second sub-set of network parameters, determines a backward-pass set of network parameters comprising the first and second sub-sets of parameters, and updates parameters corresponding to the backward-pass set of network parameters using a gradient estimate determined from the value of the objective function.

16.

发明授权
Action selection for reinforcement learning using a manager neural network that generates goal vectors defining agent objectives 有权

公开(公告)号：US11537887B2

公开(公告)日：2022-12-27

申请号：US16866753

申请日：2020-05-05

Applicant: DeepMind Technologies Limited

Inventor： Simon Osindero , Koray Kavukcuoglu , Alexander Vezhnevets

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a system configured to select actions to be performed by an agent that interacts with an environment. The system comprises a manager neural network subsystem and a worker neural network subsystem. The manager subsystem is configured to, at each of the multiple time steps, generate a final goal vector for the time step. The worker subsystem is configured to, at each of multiple time steps, use the final goal vector generated by the manager subsystem to generate a respective action score for each action in a predetermined set of actions.

17.

发明申请
ACTION SELECTION FOR REINFORCEMENT LEARNING USING NEURAL NETWORKS 审中-公开

公开(公告)号：US20200265313A1

公开(公告)日：2020-08-20

申请号：US16866753

申请日：2020-05-05

Applicant: DeepMind Technologies Limited

Inventor： Simon Osindero , Koray Kavukcuoglu , Alexander Vezhnevets

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a system configured to select actions to be performed by an agent that interacts with an environment. The system comprises a manager neural network subsystem and a worker neural network subsystem. The manager subsystem is configured to, at each of the multiple time steps, generate a final goal vector for the time step. The worker subsystem is configured to, at each of multiple time steps, use the final goal vector generated by the manager subsystem to generate a respective action score for each action in a predetermined set of actions.

18.

发明申请
ACTION SELECTION FOR REINFORCEMENT LEARNING USING NEURAL NETWORKS 审中-公开

公开(公告)号：US20190340509A1

公开(公告)日：2019-11-07

申请号：US16511571

申请日：2019-07-15

Applicant: DeepMind Technologies Limited

Inventor： Simon Osindero , Koray Kavukcuoglu , Alexander Vezhnevets

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a system configured to select actions to be performed by an agent that interacts with an environment. The system comprises a manager neural network subsystem and a worker neural network subsystem. The manager subsystem is configured to, at each of the multiple time steps, generate a final goal vector for the time step. The worker subsystem is configured to, at each of multiple time steps, use the final goal vector generated by the manager subsystem to generate a respective action score for each action in a predetermined set of actions.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification