Patent search ap:("DeepMind Technologies Limited") AND inv:"Rae Chan Jeong" Page 1

1.

发明申请
ROBUST REINFORCEMENT LEARNING FOR CONTINUOUS CONTROL WITH MODEL MISSPECIFICATION 有权

公开(公告)号：US20220343157A1

公开(公告)日：2022-10-27

申请号：US17620164

申请日：2020-06-17

Applicant: DEEPMIND TECHNOLOGIES LIMITED

Inventor： Daniel J. Mankowitz , Nir Levine , Rae Chan Jeong , Abbas Abdolmaleki , Jost Tobias Springenberg , Todd Andrew Hester , Timothy Arthur Mann , Martin Riedmiller

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a policy neural network having policy parameters. One of the methods includes sampling a mini-batch comprising one or more observation-action-reward tuples generated as a result of interactions of a first agent with a first environment; determining an update to current values of the Q network parameters by minimizing a robust entropy-regularized temporal difference (TD) error that accounts for possible perturbations of the states of the first environment represented by the observations in the observation-action-reward tuples; and determining, using the Q-value neural network, an update to the policy network parameters using the sampled mini-batch of observation-action-reward tuples.

2.

发明公开
TRAINING AN ACTION SELECTION SYSTEM USING RELATIVE ENTROPY Q-LEARNING 审中-公开

公开(公告)号：US20230214649A1

公开(公告)日：2023-07-06

申请号：US18008838

申请日：2021-07-27

Applicant: DeepMind Technologies Limited

Inventor： Rae Chan Jeong , Jost Tobias Springenberg , Jacqueline Ok-chan Kay , Daniel Hai Huan Zheng , Alexandre Galashov , Nicolas Manfred Otto Heess , Francesco Nori

IPC: G06N3/08

CPC classification number: G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection system using reinforcement learning techniques. In one aspect, a method comprises at each of multiple iterations: obtaining a batch of experience, each experience tuple comprising: a first observation, an action, a second observation, and a reward; for each experience tuple, determining a state value for the second observation, comprising: processing the first observation using a policy neural network to generate an action score for each action in a set of possible actions; sampling multiple actions from the set of possible actions in accordance with the action scores; processing the second observation using a Q neural network to generate a Q value for each sampled action; and determining the state value for the second observation; and determining an update to current values of the Q neural network parameters using the state values.

3.

发明公开
DATA-DRIVEN ROBOT CONTROL 审中-公开

公开(公告)号：US20240042600A1

公开(公告)日：2024-02-08

申请号：US18331632

申请日：2023-06-08

Applicant: DeepMind Technologies Limited

Inventor： Serkan Cabi , Ziyu Wang , Alexander Novikov , Ksenia Konyushkova , Sergio Gomez Colmenarejo , Scott Ellison Reed , Misha Man Ray Denil , Jonathan Karl Scholz , Oleg O. Sushkov , Rae Chan Jeong , David Barker , David Budden , Mel Vecerik , Yusuf Aytar , Joao Ferdinando Gomes de Freitas

IPC: B25J9/16

CPC classification number: B25J9/161 , B25J9/163 , B25J9/1661

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-driven robotic control. One of the methods includes maintaining robot experience data; obtaining annotation data; training, on the annotation data, a reward model; generating task-specific training data for the particular task, comprising, for each experience in a second subset of the experiences in the robot experience data: processing the observation in the experience using the trained reward model to generate a reward prediction, and associating the reward prediction with the experience; and training a policy neural network on the task-specific training data for the particular task, wherein the policy neural network is configured to receive a network input comprising an observation and to generate a policy output that defines a control policy for a robot performing the particular task.

4.

发明授权
Data-driven robot control 有权

公开(公告)号：US11712799B2

公开(公告)日：2023-08-01

申请号：US17020294

申请日：2020-09-14

Applicant: DeepMind Technologies Limited

Inventor： Serkan Cabi , Ziyu Wang , Alexander Novikov , Ksenia Konyushkova , Sergio Gomez Colmenarejo , Scott Ellison Reed , Misha Man Ray Denil , Jonathan Karl Scholz , Oleg O. Sushkov , Rae Chan Jeong , David Barker , David Budden , Mel Vecerik , Yusuf Aytar , Joao Ferdinando Gomes de Freitas

IPC: B25J9/16

CPC classification number: B25J9/161 , B25J9/163 , B25J9/1661

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-driven robotic control. One of the methods includes maintaining robot experience data; obtaining annotation data; training, on the annotation data, a reward model; generating task-specific training data for the particular task, comprising, for each experience in a second subset of the experiences in the robot experience data: processing the observation in the experience using the trained reward model to generate a reward prediction, and associating the reward prediction with the experience; and training a policy neural network on the task-specific training data for the particular task, wherein the policy neural network is configured to receive a network input comprising an observation and to generate a policy output that defines a control policy for a robot performing the particular task.

5.

发明申请
DOMAIN ADAPTATION FOR ROBOTIC CONTROL USING SELF-SUPERVISED LEARNING 有权

公开(公告)号：US20210103815A1

公开(公告)日：2021-04-08

申请号：US17065489

申请日：2020-10-07

Applicant: DeepMind Technologies Limited

Inventor： Rae Chan Jeong , Yusuf Aytar , David Khosid , Yuxiang Zhou , Jacqueline Ok-chan Kay , Thomas Lampe , Konstantinos Bousmalis , Francesco Nori

IPC: G06N3/08 , G05B19/4155

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a policy neural network for use in controlling a real-world agent in a real-world environment. One of the methods includes training the policy neural network by optimizing a first task-specific objective that measures a performance of the policy neural network in controlling a simulated version of the real-world agent; and then training the policy neural network by jointly optimizing (i) a self-supervised objective that measures at least a performance of internal representations generated by the policy neural network on a self-supervised task performed on real-world data and (ii) a second task-specific objective that measures the performance of the policy neural network in controlling the simulated version of the real-world agent.

6.

发明申请
DATA-DRIVEN ROBOT CONTROL 有权

公开(公告)号：US20210078169A1

公开(公告)日：2021-03-18

申请号：US17020294

申请日：2020-09-14

Applicant: DeepMind Technologies Limited

Inventor： Serkan Cabi , Ziyu Wang , Alexander Novikov , Ksenia Konyushkova , Sergio Gomez Colmenarejo , Scott Ellison Reed , Misha Man Ray Denil , Jonathan Karl Scholz , Oleg O. Sushkov , Rae Chan Jeong , David Barker , David Budden , Mel Vecerik , Yusuf Aytar , Joao Ferdinando Gomes de Freitas

IPC: B25J9/16

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-driven robotic control. One of the methods includes maintaining robot experience data; obtaining annotation data; training, on the annotation data, a reward model; generating task-specific training data for the particular task, comprising, for each experience in a second subset of the experiences in the robot experience data: processing the observation in the experience using the trained reward model to generate a reward prediction, and associating the reward prediction with the experience; and training a policy neural network on the task-specific training data for the particular task, wherein the policy neural network is configured to receive a network input comprising an observation and to generate a policy output that defines a control policy for a robot performing the particular task.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification