Patent search ap:("DEEPMIND TECHNOLOGIES LIMITED") AND inv:"Maxwell Elliot Jaderberg" Page 3

21.

发明授权
Multi-agent reinforcement learning with matchmaking policies 有权

公开(公告)号：US11627165B2

公开(公告)日：2023-04-11

申请号：US16752496

申请日：2020-01-24

Applicant: DeepMind Technologies Limited

Inventor： David Silver , Oriol Vinyals , Maxwell Elliot Jaderberg

IPC: G06N3/08 , H04L9/40 , G06K9/62

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network having a plurality of policy parameters and used to select actions to be performed by an agent to control the agent to perform a particular task while interacting with one or more other agents in an environment. In one aspect, the method includes: maintaining data specifying a pool of candidate action selection policies; maintaining data specifying respective matchmaking policy; and training the policy neural network using a reinforcement learning technique to update the policy parameters. The policy parameters define policies to be used in controlling the agent to perform the particular task.

22.

发明申请
GATED ATTENTION NEURAL NETWORKS 有权

公开(公告)号：US20220366218A1

公开(公告)日：2022-11-17

申请号：US17763984

申请日：2020-09-07

Applicant: DeepMind Technologies Limited

Inventor： Emilio Parisotto , Hasuk Song , Jack William Rae , Siddhant Madhu Jayakumar , Maxwell Elliot Jaderberg , Razvan Pascanu , Caglar Gulcehre

IPC: G06N3/04 , G06N3/08

Abstract: A system including an attention neural network that is configured to receive an input sequence and to process the input sequence to generate an output is described. The attention neural network includes: an attention block configured to receive a query input, a key input, and a value input that are derived from an attention block input. The attention block includes an attention neural network layer configured to: receive an attention layer input derived from the query input, the key input, and the value input, and apply an attention mechanism to the query input, the key input, and the value input to generate an attention layer output for the attention neural network layer; and a gating neural network layer configured to apply a gating mechanism to the attention block input and the attention layer output of the attention neural network layer to generate a gated attention output.

23.

发明申请
MULTI-AGENT REINFORCEMENT LEARNING WITH MATCHMAKING POLICIES 审中-公开

公开(公告)号：US20200244707A1

公开(公告)日：2020-07-30

申请号：US16752496

申请日：2020-01-24

Applicant: DeepMind Technologies Limited

Inventor： David Silver , Oriol Vinyals , Maxwell Elliot Jaderberg

IPC: H04L29/06 , G06N3/08 , G06K9/62

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network having a plurality of policy parameters and used to select actions to be performed by an agent to control the agent to perform a particular task while interacting with one or more other agents in an environment. In one aspect, the method includes: maintaining data specifying a pool of candidate action selection policies; maintaining data specifying respective matchmaking policy; and training the policy neural network using a reinforcement learning technique to update the policy parameters. The policy parameters define policies to be used in controlling the agent to perform the particular task.

24.

发明公开
GATED ATTENTION NEURAL NETWORKS 审中-公开

公开(公告)号：US20240320469A1

公开(公告)日：2024-09-26

申请号：US18679200

申请日：2024-05-30

Applicant: DeepMind Technologies Limited

Inventor： Emilio Parisotto , Hasuk Song , Jack William Rae , Siddhant Madhu Jayakumar , Maxwell Elliot Jaderberg , Razvan Pascanu , Caglar Gulcehre

IPC: G06N3/044 , G06N3/048 , G06N3/08

CPC classification number: G06N3/044 , G06N3/048 , G06N3/08

Abstract: A system including an attention neural network that is configured to receive an input sequence and to process the input sequence to generate an output is described. The attention neural network includes: an attention block configured to receive a query input, a key input, and a value input that are derived from an attention block input. The attention block includes an attention neural network layer configured to: receive an attention layer input derived from the query input, the key input, and the value input, and apply an attention mechanism to the query input, the key input, and the value input to generate an attention layer output for the attention neural network layer; and a gating neural network layer configured to apply a gating mechanism to the attention block input and the attention layer output of the attention neural network layer to generate a gated attention output.

25.

发明公开
ENHANCING POPULATION-BASED TRAINING OF NEURAL NETWORKS 审中-公开

公开(公告)号：US20240242091A1

公开(公告)日：2024-07-18

申请号：US18562180

申请日：2022-05-30

Applicant: DeepMind Technologies Limited

Inventor： Valentin Clement Dalibard , Maxwell Elliot Jaderberg

IPC: G06N3/0985 , G06N3/045

CPC classification number: G06N3/0985 , G06N3/045

Abstract: Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network for performing a task. The system maintains data specifying (i) a plurality of candidate neural networks and (ii) a partitioning of the plurality of candidate neural networks into a plurality of partitions. The system repeatedly performs operations, including: training each of the candidate neural networks; evaluating each candidate neural network using a respective fitness function for the partition; and for each partition, updating the respective values of the one or more hyperparameters for at least one of the candidate neural networks in the partition based on the respective fitness metrics of the candidate neural networks in the partition. After repeatedly performing the operations, the system selects, from the maintained data, the respective values of the network parameters of one of the candidate neural networks.

26.

发明公开
DEEP REINFORCEMENT LEARNING WITH FAST UPDATING RECURRENT NEURAL NETWORKS AND SLOW UPDATING RECURRENT NEURAL NETWORKS 审中-公开

公开(公告)号：US20240220774A1

公开(公告)日：2024-07-04

申请号：US18536065

申请日：2023-12-11

Applicant: DeepMind Technologies Limited

Inventor： Iain Robert Dunning , Wojciech Czarnecki , Maxwell Elliot Jaderberg

IPC: G06N3/045 , G06F17/18 , G06N3/047 , G06N3/08

CPC classification number: G06N3/045 , G06F17/18 , G06N3/047 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning. One of the methods includes selecting an action to be performed by the agent using both a slow updating recurrent neural network and a fast updating recurrent neural network that receives a fast updating input that includes the hidden state of the slow updating recurrent neural network.

27.

发明公开
REINFORCEMENT LEARNING WITH AUXILIARY TASKS 审中-公开

公开(公告)号：US20240144015A1

公开(公告)日：2024-05-02

申请号：US18386954

申请日：2023-11-03

Applicant: DeepMind Technologies Limited

Inventor： Volodymyr Mnih , Wojciech Czarnecki , Maxwell Elliot Jaderberg , Tom Schaul , David Silver , Koray Kavukcuoglu

IPC: G06N3/084 , G06N3/006 , G06N3/044 , G06N3/045 , G06N20/00

CPC classification number: G06N3/084 , G06N3/006 , G06N3/044 , G06N3/045 , G06N20/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.

28.

发明申请
REINFORCEMENT LEARNING WITH AUXILIARY TASKS 有权

公开(公告)号：US20210182688A1

公开(公告)日：2021-06-17

申请号：US17183618

申请日：2021-02-24

Applicant: DeepMind Technologies Limited

Inventor： Volodymyr Mnih , Wojciech Czarnecki , Maxwell Elliot Jaderberg , Tom Schaul , David Silver , Koray Kavukcuoglu

IPC: G06N3/08 , G06N20/00 , G06N3/04 , G06N3/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.

29.

发明申请
POPULATION-BASED TRAINING OF MACHINE LEARNING MODELS 有权

公开(公告)号：US20210097443A1

公开(公告)日：2021-04-01

申请号：US16586236

申请日：2019-09-27

Applicant: DeepMind Technologies Limited

Inventor： Ang Li , Valentin Clement Dalibard , David Budden , Ola Spyra , Maxwell Elliot Jaderberg , Timothy James Alexander Harley , Sagi Perel , Chenjie Gu , Pramod Gupta

IPC: G06N20/20 , G06N5/04 , G06F16/901

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a machine learning model. A method includes: maintaining a plurality of training sessions; assigning, to each worker of one or more workers, a respective training session of the plurality of training sessions; repeatedly performing operations until meeting one or more termination criteria, the operations comprising: receiving an updated training session from a respective worker of the one or more workers, selecting a second training session, selecting, based on comparing the updated training session and the second training session using a fitness evaluation function, either the updated training session or the second training session as a parent training session, generating a child training session from the selected parent training session, and assigning the child training session to an available worker, and selecting a candidate model to be a trained model for the machine learning model.

30.

发明授权
Reinforcement learning with auxiliary tasks 有权

公开(公告)号：US10956820B2

公开(公告)日：2021-03-23

申请号：US16403385

申请日：2019-05-03

Applicant: DeepMind Technologies Limited

Inventor： Volodymyr Mnih , Wojciech Czarnecki , Maxwell Elliot Jaderberg , Tom Schaul , David Silver , Koray Kavukcuoglu

IPC: G06N3/08 , G06N20/00 , G06N3/04 , G06N3/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification