-
公开(公告)号:US20240403652A1
公开(公告)日:2024-12-05
申请号:US18699012
申请日:2022-10-05
Applicant: DeepMind Technologies Limited
Inventor: Dushyant ` Rao , Fereshteh Sadeghi , Leonard Hasenclever , Markus Wulfmeier , Martina Zambelli , Giulia Vezzani , Dhruva Tirumala Bukkapatnam , Yusuf Aytar , Joshua Merel , Nicolas Manfred Otto Heess , Raia Thais Hadsell
IPC: G06N3/092
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents. In particular, an agent can be controlled using a hierarchical controller that includes a high-level controller neural network, a mid-level controller neural network, and a low-level controller neural network.
-
公开(公告)号:US20230330846A1
公开(公告)日:2023-10-19
申请号:US18028966
申请日:2021-10-01
Applicant: DeepMind Technologies Limited
Inventor: Yuxiang Zhou , Yusuf Aytar , Konstantinos Bousmalis
IPC: B25J9/16
CPC classification number: B25J9/163
Abstract: It is described a system implemented as computer programs on one or more computers in one or more locations that trains a policy neural network that is used to control a robot, i.e., to select actions to be performed by the robot while the robot is interacting with an environment, through imitation learning in order to cause the robot to perform particular tasks in the environment.
-
公开(公告)号:US20220004883A1
公开(公告)日:2022-01-06
申请号:US17295286
申请日:2019-11-21
Applicant: DeepMind Technologies Limited
Inventor: Yusuf Aytar , Debidatta Dwibedi , Andrew Zisserman , Jonathan Tompson , Pierre Sermanet
Abstract: An encoder neural network is described which can encode a data item, such as a frame of a video, to form a respective encoded data item. Data items of a first data sequence are associated with respective data items of a second sequence, by determining which of the encoded data items of the second sequence is closest to the encoded data item produced from each data item of the first sequence. Thus, the two data sequences are aligned. The encoder neural network is trained automatically using a training set of data sequences, by an iterative process of successively increasing cycle consistency between pairs of the data sequences.
-
公开(公告)号:US20240303897A1
公开(公告)日:2024-09-12
申请号:US18600552
申请日:2024-03-08
Applicant: DeepMind Technologies Limited
Inventor: Carl Doersch , Yi Yang , Mel Vecerik , Dilara Gokay , Ankush Gupta , Yusuf Aytar , Joao Carreira , Andrew Zisserman
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for animating images using point trajectories.
-
公开(公告)号:US20240042600A1
公开(公告)日:2024-02-08
申请号:US18331632
申请日:2023-06-08
Applicant: DeepMind Technologies Limited
Inventor: Serkan Cabi , Ziyu Wang , Alexander Novikov , Ksenia Konyushkova , Sergio Gomez Colmenarejo , Scott Ellison Reed , Misha Man Ray Denil , Jonathan Karl Scholz , Oleg O. Sushkov , Rae Chan Jeong , David Barker , David Budden , Mel Vecerik , Yusuf Aytar , Joao Ferdinando Gomes de Freitas
IPC: B25J9/16
CPC classification number: B25J9/161 , B25J9/163 , B25J9/1661
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-driven robotic control. One of the methods includes maintaining robot experience data; obtaining annotation data; training, on the annotation data, a reward model; generating task-specific training data for the particular task, comprising, for each experience in a second subset of the experiences in the robot experience data: processing the observation in the experience using the trained reward model to generate a reward prediction, and associating the reward prediction with the experience; and training a policy neural network on the task-specific training data for the particular task, wherein the policy neural network is configured to receive a network input comprising an observation and to generate a policy output that defines a control policy for a robot performing the particular task.
-
公开(公告)号:US11712799B2
公开(公告)日:2023-08-01
申请号:US17020294
申请日:2020-09-14
Applicant: DeepMind Technologies Limited
Inventor: Serkan Cabi , Ziyu Wang , Alexander Novikov , Ksenia Konyushkova , Sergio Gomez Colmenarejo , Scott Ellison Reed , Misha Man Ray Denil , Jonathan Karl Scholz , Oleg O. Sushkov , Rae Chan Jeong , David Barker , David Budden , Mel Vecerik , Yusuf Aytar , Joao Ferdinando Gomes de Freitas
IPC: B25J9/16
CPC classification number: B25J9/161 , B25J9/163 , B25J9/1661
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-driven robotic control. One of the methods includes maintaining robot experience data; obtaining annotation data; training, on the annotation data, a reward model; generating task-specific training data for the particular task, comprising, for each experience in a second subset of the experiences in the robot experience data: processing the observation in the experience using the trained reward model to generate a reward prediction, and associating the reward prediction with the experience; and training a policy neural network on the task-specific training data for the particular task, wherein the policy neural network is configured to receive a network input comprising an observation and to generate a policy output that defines a control policy for a robot performing the particular task.
-
公开(公告)号:US11663441B2
公开(公告)日:2023-05-30
申请号:US16586437
申请日:2019-09-27
Applicant: DeepMind Technologies Limited
Inventor: Scott Ellison Reed , Yusuf Aytar , Ziyu Wang , Tom Paine , Sergio Gomez Colmenarejo , David Budden , Tobias Pfaff , Aaron Gerard Antonius van den Oord , Oriol Vinyals , Alexander Novikov
IPC: G06N3/006 , G06F17/16 , G06N3/08 , G06F18/22 , G06N3/045 , G06N3/048 , G06V10/764 , G06V10/77 , G06V10/82
CPC classification number: G06N3/006 , G06F17/16 , G06F18/22 , G06N3/045 , G06N3/048 , G06N3/08 , G06V10/764 , G06V10/7715 , G06V10/82
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection policy neural network, wherein the action selection policy neural network is configured to process an observation characterizing a state of an environment to generate an action selection policy output, wherein the action selection policy output is used to select an action to be performed by an agent interacting with an environment. In one aspect, a method comprises: obtaining an observation characterizing a state of the environment subsequent to the agent performing a selected action; generating a latent representation of the observation; processing the latent representation of the observation using a discriminator neural network to generate an imitation score; determining a reward from the imitation score; and adjusting the current values of the action selection policy neural network parameters based on the reward using a reinforcement learning training technique.
-
公开(公告)号:US20210103815A1
公开(公告)日:2021-04-08
申请号:US17065489
申请日:2020-10-07
Applicant: DeepMind Technologies Limited
Inventor: Rae Chan Jeong , Yusuf Aytar , David Khosid , Yuxiang Zhou , Jacqueline Ok-chan Kay , Thomas Lampe , Konstantinos Bousmalis , Francesco Nori
IPC: G06N3/08 , G05B19/4155
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a policy neural network for use in controlling a real-world agent in a real-world environment. One of the methods includes training the policy neural network by optimizing a first task-specific objective that measures a performance of the policy neural network in controlling a simulated version of the real-world agent; and then training the policy neural network by jointly optimizing (i) a self-supervised objective that measures at least a performance of internal representations generated by the policy neural network on a self-supervised task performed on real-world data and (ii) a second task-specific objective that measures the performance of the policy neural network in controlling the simulated version of the real-world agent.
-
公开(公告)号:US20210078169A1
公开(公告)日:2021-03-18
申请号:US17020294
申请日:2020-09-14
Applicant: DeepMind Technologies Limited
Inventor: Serkan Cabi , Ziyu Wang , Alexander Novikov , Ksenia Konyushkova , Sergio Gomez Colmenarejo , Scott Ellison Reed , Misha Man Ray Denil , Jonathan Karl Scholz , Oleg O. Sushkov , Rae Chan Jeong , David Barker , David Budden , Mel Vecerik , Yusuf Aytar , Joao Ferdinando Gomes de Freitas
IPC: B25J9/16
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-driven robotic control. One of the methods includes maintaining robot experience data; obtaining annotation data; training, on the annotation data, a reward model; generating task-specific training data for the particular task, comprising, for each experience in a second subset of the experiences in the robot experience data: processing the observation in the experience using the trained reward model to generate a reward prediction, and associating the reward prediction with the experience; and training a policy neural network on the task-specific training data for the particular task, wherein the policy neural network is configured to receive a network input comprising an observation and to generate a policy output that defines a control policy for a robot performing the particular task.
-
-
-
-
-
-
-
-