Patent search ap:("DeepMind Technologies Limited") AND inv:"Francesco Nori" Page 1

1.

发明申请
DOMAIN ADAPTATION FOR ROBOTIC CONTROL USING SELF-SUPERVISED LEARNING 有权

公开(公告)号：US20210103815A1

公开(公告)日：2021-04-08

申请号：US17065489

申请日：2020-10-07

Applicant: DeepMind Technologies Limited

Inventor： Rae Chan Jeong , Yusuf Aytar , David Khosid , Yuxiang Zhou , Jacqueline Ok-chan Kay , Thomas Lampe , Konstantinos Bousmalis , Francesco Nori

IPC: G06N3/08 , G05B19/4155

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a policy neural network for use in controlling a real-world agent in a real-world environment. One of the methods includes training the policy neural network by optimizing a first task-specific objective that measures a performance of the policy neural network in controlling a simulated version of the real-world agent; and then training the policy neural network by jointly optimizing (i) a self-supervised objective that measures at least a performance of internal representations generated by the policy neural network on a self-supervised task performed on real-world data and (ii) a second task-specific objective that measures the performance of the policy neural network in controlling the simulated version of the real-world agent.

2.

发明公开
TRAINING AN ACTION SELECTION SYSTEM USING RELATIVE ENTROPY Q-LEARNING 审中-公开

公开(公告)号：US20230214649A1

公开(公告)日：2023-07-06

申请号：US18008838

申请日：2021-07-27

Applicant: DeepMind Technologies Limited

Inventor： Rae Chan Jeong , Jost Tobias Springenberg , Jacqueline Ok-chan Kay , Daniel Hai Huan Zheng , Alexandre Galashov , Nicolas Manfred Otto Heess , Francesco Nori

IPC: G06N3/08

CPC classification number: G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection system using reinforcement learning techniques. In one aspect, a method comprises at each of multiple iterations: obtaining a batch of experience, each experience tuple comprising: a first observation, an action, a second observation, and a reward; for each experience tuple, determining a state value for the second observation, comprising: processing the first observation using a policy neural network to generate an action score for each action in a set of possible actions; sampling multiple actions from the set of possible actions in accordance with the action scores; processing the second observation using a Q neural network to generate a Q value for each sampled action; and determining the state value for the second observation; and determining an update to current values of the Q neural network parameters using the state values.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification