Patent search ap:("DeepMind Technologies Limited") AND inv:"Misha Man Ray Denil" Page 2

11.

发明公开
DATA-DRIVEN ROBOT CONTROL 审中-公开

公开(公告)号：US20240042600A1

公开(公告)日：2024-02-08

申请号：US18331632

申请日：2023-06-08

Applicant: DeepMind Technologies Limited

Inventor： Serkan Cabi , Ziyu Wang , Alexander Novikov , Ksenia Konyushkova , Sergio Gomez Colmenarejo , Scott Ellison Reed , Misha Man Ray Denil , Jonathan Karl Scholz , Oleg O. Sushkov , Rae Chan Jeong , David Barker , David Budden , Mel Vecerik , Yusuf Aytar , Joao Ferdinando Gomes de Freitas

IPC: B25J9/16

CPC classification number: B25J9/161 , B25J9/163 , B25J9/1661

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-driven robotic control. One of the methods includes maintaining robot experience data; obtaining annotation data; training, on the annotation data, a reward model; generating task-specific training data for the particular task, comprising, for each experience in a second subset of the experiences in the robot experience data: processing the observation in the experience using the trained reward model to generate a reward prediction, and associating the reward prediction with the experience; and training a policy neural network on the task-specific training data for the particular task, wherein the policy neural network is configured to receive a network input comprising an observation and to generate a policy output that defines a control policy for a robot performing the particular task.

12.

发明授权
Data-driven robot control 有权

公开(公告)号：US11712799B2

公开(公告)日：2023-08-01

申请号：US17020294

申请日：2020-09-14

Applicant: DeepMind Technologies Limited

Inventor： Serkan Cabi , Ziyu Wang , Alexander Novikov , Ksenia Konyushkova , Sergio Gomez Colmenarejo , Scott Ellison Reed , Misha Man Ray Denil , Jonathan Karl Scholz , Oleg O. Sushkov , Rae Chan Jeong , David Barker , David Budden , Mel Vecerik , Yusuf Aytar , Joao Ferdinando Gomes de Freitas

IPC: B25J9/16

CPC classification number: B25J9/161 , B25J9/163 , B25J9/1661

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-driven robotic control. One of the methods includes maintaining robot experience data; obtaining annotation data; training, on the annotation data, a reward model; generating task-specific training data for the particular task, comprising, for each experience in a second subset of the experiences in the robot experience data: processing the observation in the experience using the trained reward model to generate a reward prediction, and associating the reward prediction with the experience; and training a policy neural network on the task-specific training data for the particular task, wherein the policy neural network is configured to receive a network input comprising an observation and to generate a policy output that defines a control policy for a robot performing the particular task.

13.

发明申请
AUTOREGRESSIVELY GENERATING SEQUENCES OF DATA ELEMENTS DEFINING ACTIONS TO BE PERFORMED BY AN AGENT 有权

公开(公告)号：US20230061411A1

公开(公告)日：2023-03-02

申请号：US17410689

申请日：2021-08-24

Applicant: DeepMind Technologies Limited

Inventor： Tom Erez , Alexander Novikov , Emilio Parisotto , Jack William Rae , Konrad Zolna , Misha Man Ray Denil , Joao Ferdinando Gomes de Freitas , Oriol Vinyals , Scott Ellison Reed , Sergio Gomez , Ashley Deloris Edwards , Jacob Bruce , Gabriel Barth-Maron

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent to interact with an environment using an action selection neural network. In one aspect, a method comprises, at each time step in a sequence of time steps: generating a current representation of a state of a task being performed by the agent in the environment as of the current time step as a sequence of data elements; autoregressively generating a sequence of data elements representing a current action to be performed by the agent at the current time step; and after autoregressively generating the sequence of data elements representing the current action, causing the agent to perform the current action at the current time step.

14.

发明申请
DATA-DRIVEN ROBOT CONTROL 有权

公开(公告)号：US20210078169A1

公开(公告)日：2021-03-18

申请号：US17020294

申请日：2020-09-14

Applicant: DeepMind Technologies Limited

Inventor： Serkan Cabi , Ziyu Wang , Alexander Novikov , Ksenia Konyushkova , Sergio Gomez Colmenarejo , Scott Ellison Reed , Misha Man Ray Denil , Jonathan Karl Scholz , Oleg O. Sushkov , Rae Chan Jeong , David Barker , David Budden , Mel Vecerik , Yusuf Aytar , Joao Ferdinando Gomes de Freitas

IPC: B25J9/16

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-driven robotic control. One of the methods includes maintaining robot experience data; obtaining annotation data; training, on the annotation data, a reward model; generating task-specific training data for the particular task, comprising, for each experience in a second subset of the experiences in the robot experience data: processing the observation in the experience using the trained reward model to generate a reward prediction, and associating the reward prediction with the experience; and training a policy neural network on the task-specific training data for the particular task, wherein the policy neural network is configured to receive a network input comprising an observation and to generate a policy output that defines a control policy for a robot performing the particular task.

15.

发明授权
Environment navigation using reinforcement learning 有权

公开(公告)号：US10572776B2

公开(公告)日：2020-02-25

申请号：US16403343

申请日：2019-05-03

Applicant: DeepMind Technologies Limited

Inventor： Fabio Viola , Piotr Wojciech Mirowski , Andrea Banino , Razvan Pascanu , Hubert Josef Soyer , Andrew James Ballard , Sudarshan Kumaran , Raia Thais Hadsell , Laurent Sifre , Rostislav Goroshin , Koray Kavukcuoglu , Misha Man Ray Denil

IPC: G06K9/00 , G06K9/62 , G06N3/04 , G06N3/08 , G06N3/00 , G06T7/50 , G06T7/70

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification