Patent search ap:("DEEPMIND TECHNOLOGIES LIMITED") AND inv:"Marc Gendron-Bellemare" Page 1

1.

发明申请
DISTRIBUTIONAL REINFORCEMENT LEARNING 有权

公开(公告)号：US20210064970A1

公开(公告)日：2021-03-04

申请号：US17098870

申请日：2020-11-16

Applicant: DeepMind Technologies Limited

Inventor： Marc Gendron-Bellemare , William Clinton Dabney

IPC: G06N3/04 , G06F17/18 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. A current observation characterizing a current state of the environment is received. For each action in a set of multiple actions that can be performed by the agent to interact with the environment, a probability distribution is determined over possible Q returns for the action-current observation pair. For each action, a measure of central tendency of the possible Q returns with respect to the probability distributions for the action-current observation pair is determined. An action to be performed by the agent in response to the current observation is selected using the measures of central tendency.

2.

发明授权
Training machine learning models using task selection policies to increase learning progress 有权

公开(公告)号：US10936949B2

公开(公告)日：2021-03-02

申请号：US16508042

申请日：2019-07-10

Applicant: DeepMind Technologies Limited

Inventor： Marc Gendron-Bellemare , Jacob Lee Menick , Alexander Benjamin Graves , Koray Kavukcuoglu , Remi Munos

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model. In one aspect, a method includes receiving training data for training the machine learning model on a plurality of tasks, where each task includes multiple batches of training data. A task is selected in accordance with a current task selection policy. A batch of training data is selected from the selected task. The machine learning model is trained on the selected batch of training data to determine updated values of the model parameters. A learning progress measure that represents a progress of the training of the machine learning model as a result of training the machine learning model on the selected batch of training data is determined. The current task selection policy is updated using the learning progress measure.

3.

发明授权
Evaluating reinforcement learning policies 有权

公开(公告)号：US10445653B1

公开(公告)日：2019-10-15

申请号：US14821549

申请日：2015-08-07

Applicant: DeepMind Technologies Limited

Inventor： Joel William Veness , Marc Gendron-Bellemare

IPC: G06N20/00 , G06N5/02

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for evaluating reinforcement learning policies. One of the methods includes receiving a plurality of training histories for a reinforcement learning agent; determining a total reward for each training observation in the training histories; partitioning the training observations into a plurality of partitions; determining, for each partition and from the partitioned training observations, a probability that the reinforcement learning agent will receive the total reward for the partition if the reinforcement learning agent performs the action for the partition in response to receiving the current observation; determining, from the probabilities and for each total reward, a respective estimated value of performing each action in response to receiving the current observation; and selecting an action from the pre-determined set of actions from the estimated values in accordance with an action selection policy.

4.

发明申请
DISTRIBUTIONAL REINFORCEMENT LEARNING 有权

公开(公告)号：US20240370707A1

公开(公告)日：2024-11-07

申请号：US18754726

申请日：2024-06-26

Applicant: DeepMind Technologies Limited

Inventor： Marc Gendron-Bellemare , William Clinton Dabney

IPC: G06N3/047 , G06F17/18 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. A current observation characterizing a current state of the environment is received. For each action in a set of multiple actions that can be performed by the agent to interact with the environment, a probability distribution is determined over possible Q returns for the action-current observation pair. For each action, a measure of central tendency of the possible Q returns with respect to the probability distributions for the action-current observation pair is determined. An action to be performed by the agent in response to the current observation is selected using the measures of central tendency.

5.

发明授权
Distributional reinforcement learning 有权

公开(公告)号：US10860920B2

公开(公告)日：2020-12-08

申请号：US16508046

申请日：2019-07-10

Applicant: DeepMind Technologies Limited

Inventor： Marc Gendron-Bellemare , William Clinton Dabney

IPC: G06N3/04 , G06F17/18 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. A current observation characterizing a current state of the environment is received. For each action in a set of multiple actions that can be performed by the agent to interact with the environment, a probability distribution is determined over possible Q returns for the action-current observation pair. For each action, a measure of central tendency of the possible Q returns with respect to the probability distributions for the action-current observation pair is determined. An action to be performed by the agent in response to the current observation is selected using the measures of central tendency.

6.

发明申请
REINFORCEMENT LEARNING USING PSEUDO-COUNTS 审中-公开

公开(公告)号：US20200327405A1

公开(公告)日：2020-10-15

申请号：US16303501

申请日：2017-05-18

Applicant: DEEPMIND TECHNOLOGIES LIMITED

Inventor： Marc Gendron-Bellemare , Remi Munos , Srinivasan Sriram

IPC: G06N3/08 , G06N3/04 , G06F17/18

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation; determining a pseudo-count for the first observation; determining an exploration reward bonus that incentivizes the agent to explore the environment from the pseudo-count for the first observation; generating a combined reward from the actual reward and the exploration reward bonus; and adjusting current values of the parameters of the neural network using the combined reward.

7.

发明授权
Reinforcement learning using pseudo-counts 有权

公开(公告)号：US11727264B2

公开(公告)日：2023-08-15

申请号：US16303501

申请日：2017-05-18

Applicant: DEEPMIND TECHNOLOGIES LIMITED

Inventor： Marc Gendron-Bellemare , Remi Munos , Srinivasan Sriram

IPC: G06N3/08 , G06F17/18 , G06N3/047

CPC classification number: G06N3/08 , G06F17/18 , G06N3/047

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation; determining a pseudo-count for the first observation; determining an exploration reward bonus that incentivizes the agent to explore the environment from the pseudo-count for the first observation; generating a combined reward from the actual reward and the exploration reward bonus; and adjusting current values of the parameters of the neural network using the combined reward.

8.

发明申请
TRAINING MACHINE LEARNING MODELS USING TASK SELECTION POLICIES TO INCREASE LEARNING PROGRESS 有权

公开(公告)号：US20210150355A1

公开(公告)日：2021-05-20

申请号：US17159961

申请日：2021-01-27

Applicant: DeepMind Technologies Limited

Inventor： Marc Gendron-Bellemare , Jacob Lee Menick , Alexander Benjamin Graves , Koray Kavukcuoglu , Remi Munos

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model. In one aspect, a method includes receiving training data for training the machine learning model on a plurality of tasks, where each task includes multiple batches of training data. A task is selected in accordance with a current task selection policy. A batch of training data is selected from the selected task. The machine learning model is trained on the selected batch of training data to determine updated values of the model parameters. A learning progress measure that represents a progress of the training of the machine learning model as a result of training the machine learning model on the selected batch of training data is determined. The current task selection policy is updated using the learning progress measure.

9.

发明申请
TRAINING MACHINE LEARNING MODELS 审中-公开

公开(公告)号：US20190332938A1

公开(公告)日：2019-10-31

申请号：US16508042

申请日：2019-07-10

Applicant: DeepMind Technologies Limited

Inventor： Marc Gendron-Bellemare , Jacob Lee Menick , Alexander Benjamin Graves , Koray Kavukcuoglu , Remi Munos

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model. In one aspect, a method includes receiving training data for training the machine learning model on a plurality of tasks, where each task includes multiple batches of training data. A task is selected in accordance with a current task selection policy. A batch of training data is selected from the selected task. The machine learning model is trained on the selected batch of training data to determine updated values of the model parameters. A learning progress measure that represents a progress of the training of the machine learning model as a result of training the machine learning model on the selected batch of training data is determined. The current task selection policy is updated using the learning progress measure.

10.

发明授权
Distributional reinforcement learning 有权

公开(公告)号：US12056593B2

公开(公告)日：2024-08-06

申请号：US17098870

申请日：2020-11-16

Applicant: DeepMind Technologies Limited

Inventor： Marc Gendron-Bellemare , William Clinton Dabney

IPC: G06N3/047 , G06F17/18 , G06N3/08

CPC classification number: G06N3/047 , G06F17/18 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. A current observation characterizing a current state of the environment is received. For each action in a set of multiple actions that can be performed by the agent to interact with the environment, a probability distribution is determined over possible Q returns for the action-current observation pair. For each action, a measure of central tendency of the possible Q returns with respect to the probability distributions for the action-current observation pair is determined. An action to be performed by the agent in response to the current observation is selected using the measures of central tendency.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification