Patent search ap:("DeepMind Technologies Limited") AND inv:"Alexander Novikov" Page 1

1.

发明公开
OPTIMIZING ALGORITHMS FOR HARDWARE DEVICES 审中-公开

公开(公告)号：US20240127045A1

公开(公告)日：2024-04-18

申请号：US17959210

申请日：2022-10-03

Applicant: DeepMind Technologies Limited

Inventor： Thomas Keisuke Hubert , Shih-Chieh Huang , Alexander Novikov , Alhussein Fawzi , Bernardino Romera-Paredes , David Silver , Demis Hassabis , Grzegorz Michal Swirszcz , Julian Schrittwieser , Pushmeet Kohli , Mohammadamin Barekatain , Matej Balog , Francisco Jesus Rodriguez Ruiz

IPC: G06N3/08 , G06N3/063

CPC classification number: G06N3/08 , G06N3/063

Abstract: A method performed by one or more computers for obtaining an optimized algorithm that (i) is functionally equivalent to a target algorithm and (ii) optimizes one or more target properties when executed on a target set of one or more hardware devices. The method includes: initializing a target tensor representing the target algorithm; generating, using a neural network having a plurality of network parameters, a tensor decomposition of the target tensor that parametrizes a candidate algorithm; generating target property values for each of the target properties when executing the candidate algorithm on the target set of hardware devices; determining a benchmarking score for the tensor decomposition based on the target property values of the candidate algorithm; generating a training example from the tensor decomposition and the benchmarking score; and storing, in a training data store, the training example for use in updating the network parameters of the neural network.

2.

发明申请
TRAINING A NEURAL NETWORK TO CONTROL AN AGENT USING TASK-RELEVANT ADVERSARIAL IMITATION LEARNING 有权

公开(公告)号：US20220261639A1

公开(公告)日：2022-08-18

申请号：US17625361

申请日：2020-07-16

Applicant: DeepMind Technologies Limited

Inventor： Konrad Zolna , Scott Ellison Reed , Ziyu Wang , Alexander Novikov , Sergio Gomez Colmenarejo , Joao Ferdinando Gomes de Freitas , David Budden , Serkan Cabi

IPC: G06N3/08

Abstract: A method is proposed of training a neural network to generate action data for controlling an agent to perform a task in an environment. The method includes obtaining, for each of a plurality of performances of the task, one or more first tuple datasets, each first tuple dataset comprising state data characterizing a state of the environment at a corresponding time during the performance of the task; and a concurrent process of training the neural network and a discriminator network. The training process comprises a plurality of neural network update steps and a plurality of discriminator network update steps. Each neural network update step comprises: receiving state data characterizing a current state of the environment; using the neural network and the state data to generate action data indicative of an action to be performed by the agent; forming a second tuple dataset comprising the state data; using the second tuple dataset to generate a reward value, wherein the reward value comprises an imitation value generated by the discriminator network based on the second tuple dataset; and updating one or more parameters of the neural network based on the reward value. Each discriminator network update step comprises updating the discriminator network based on a plurality of the first tuple datasets and a plurality of the second tuple datasets, the update being to increase respective imitation values which the discriminator network generates upon receiving any of the plurality of the first tuple datasets compared to respective imitation values which the discriminator network generates upon receiving any of the plurality of the second tuple datasets. The updating process is performed subject to a constraint that the updated discriminator network, upon receiving any of at least a certain proportion of a first subset of the first tuple datasets and/or any of at least a certain proportion of a second subset of the second tuple datasets, does not generate imitation values which correctly indicate that those tuple datasets are first or second tuple datasets.

3.

发明公开
DATA-DRIVEN ROBOT CONTROL 审中-公开

公开(公告)号：US20240042600A1

公开(公告)日：2024-02-08

申请号：US18331632

申请日：2023-06-08

Applicant: DeepMind Technologies Limited

Inventor： Serkan Cabi , Ziyu Wang , Alexander Novikov , Ksenia Konyushkova , Sergio Gomez Colmenarejo , Scott Ellison Reed , Misha Man Ray Denil , Jonathan Karl Scholz , Oleg O. Sushkov , Rae Chan Jeong , David Barker , David Budden , Mel Vecerik , Yusuf Aytar , Joao Ferdinando Gomes de Freitas

IPC: B25J9/16

CPC classification number: B25J9/161 , B25J9/163 , B25J9/1661

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-driven robotic control. One of the methods includes maintaining robot experience data; obtaining annotation data; training, on the annotation data, a reward model; generating task-specific training data for the particular task, comprising, for each experience in a second subset of the experiences in the robot experience data: processing the observation in the experience using the trained reward model to generate a reward prediction, and associating the reward prediction with the experience; and training a policy neural network on the task-specific training data for the particular task, wherein the policy neural network is configured to receive a network input comprising an observation and to generate a policy output that defines a control policy for a robot performing the particular task.

4.

发明授权
Data-driven robot control 有权

公开(公告)号：US11712799B2

公开(公告)日：2023-08-01

申请号：US17020294

申请日：2020-09-14

Applicant: DeepMind Technologies Limited

Inventor： Serkan Cabi , Ziyu Wang , Alexander Novikov , Ksenia Konyushkova , Sergio Gomez Colmenarejo , Scott Ellison Reed , Misha Man Ray Denil , Jonathan Karl Scholz , Oleg O. Sushkov , Rae Chan Jeong , David Barker , David Budden , Mel Vecerik , Yusuf Aytar , Joao Ferdinando Gomes de Freitas

IPC: B25J9/16

CPC classification number: B25J9/161 , B25J9/163 , B25J9/1661

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-driven robotic control. One of the methods includes maintaining robot experience data; obtaining annotation data; training, on the annotation data, a reward model; generating task-specific training data for the particular task, comprising, for each experience in a second subset of the experiences in the robot experience data: processing the observation in the experience using the trained reward model to generate a reward prediction, and associating the reward prediction with the experience; and training a policy neural network on the task-specific training data for the particular task, wherein the policy neural network is configured to receive a network input comprising an observation and to generate a policy output that defines a control policy for a robot performing the particular task.

5.

发明授权
Action selection neural network training using imitation learning in latent space 有权

公开(公告)号：US11663441B2

公开(公告)日：2023-05-30

申请号：US16586437

申请日：2019-09-27

Applicant: DeepMind Technologies Limited

Inventor： Scott Ellison Reed , Yusuf Aytar , Ziyu Wang , Tom Paine , Sergio Gomez Colmenarejo , David Budden , Tobias Pfaff , Aaron Gerard Antonius van den Oord , Oriol Vinyals , Alexander Novikov

IPC: G06N3/006 , G06F17/16 , G06N3/08 , G06F18/22 , G06N3/045 , G06N3/048 , G06V10/764 , G06V10/77 , G06V10/82

CPC classification number: G06N3/006 , G06F17/16 , G06F18/22 , G06N3/045 , G06N3/048 , G06N3/08 , G06V10/764 , G06V10/7715 , G06V10/82

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection policy neural network, wherein the action selection policy neural network is configured to process an observation characterizing a state of an environment to generate an action selection policy output, wherein the action selection policy output is used to select an action to be performed by an agent interacting with an environment. In one aspect, a method comprises: obtaining an observation characterizing a state of the environment subsequent to the agent performing a selected action; generating a latent representation of the observation; processing the latent representation of the observation using a discriminator neural network to generate an imitation score; determining a reward from the imitation score; and adjusting the current values of the action selection policy neural network parameters based on the reward using a reinforcement learning training technique.

6.

发明申请
AUTOREGRESSIVELY GENERATING SEQUENCES OF DATA ELEMENTS DEFINING ACTIONS TO BE PERFORMED BY AN AGENT 有权

公开(公告)号：US20230061411A1

公开(公告)日：2023-03-02

申请号：US17410689

申请日：2021-08-24

Applicant: DeepMind Technologies Limited

Inventor： Tom Erez , Alexander Novikov , Emilio Parisotto , Jack William Rae , Konrad Zolna , Misha Man Ray Denil , Joao Ferdinando Gomes de Freitas , Oriol Vinyals , Scott Ellison Reed , Sergio Gomez , Ashley Deloris Edwards , Jacob Bruce , Gabriel Barth-Maron

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent to interact with an environment using an action selection neural network. In one aspect, a method comprises, at each time step in a sequence of time steps: generating a current representation of a state of a task being performed by the agent in the environment as of the current time step as a sequence of data elements; autoregressively generating a sequence of data elements representing a current action to be performed by the agent at the current time step; and after autoregressively generating the sequence of data elements representing the current action, causing the agent to perform the current action at the current time step.

7.

发明申请
DATA-DRIVEN ROBOT CONTROL 有权

公开(公告)号：US20210078169A1

公开(公告)日：2021-03-18

申请号：US17020294

申请日：2020-09-14

Applicant: DeepMind Technologies Limited

Inventor： Serkan Cabi , Ziyu Wang , Alexander Novikov , Ksenia Konyushkova , Sergio Gomez Colmenarejo , Scott Ellison Reed , Misha Man Ray Denil , Jonathan Karl Scholz , Oleg O. Sushkov , Rae Chan Jeong , David Barker , David Budden , Mel Vecerik , Yusuf Aytar , Joao Ferdinando Gomes de Freitas

IPC: B25J9/16

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-driven robotic control. One of the methods includes maintaining robot experience data; obtaining annotation data; training, on the annotation data, a reward model; generating task-specific training data for the particular task, comprising, for each experience in a second subset of the experiences in the robot experience data: processing the observation in the experience using the trained reward model to generate a reward prediction, and associating the reward prediction with the experience; and training a policy neural network on the task-specific training data for the particular task, wherein the policy neural network is configured to receive a network input comprising an observation and to generate a policy output that defines a control policy for a robot performing the particular task.

8.

发明申请
SCHEDULING JOBS USING SCORED COMPUTATIONAL RESOURCES AND EFFICIENT SCHEDULING ALGORITHMS 有权

公开(公告)号：US20250147810A1

公开(公告)日：2025-05-08

申请号：US18936711

申请日：2024-11-04

Applicant: DeepMind Technologies Limited

Inventor： Bernardino Romera-Paredes , Alexander Novikov , Mohammadamin Barekatain , Matej Balog , Pawan Kumar Mudigonda , Emilien Dupont , Francisco Jesus Rodriguez Ruiz , Alhussein Fawzi

IPC: G06F9/50

Abstract: Methods, systems, and apparatuses, including computer programs encoded on computer storage media, for scheduling jobs across a plurality of computational resources. Scheduling jobs (e.g., compute jobs) on a plurality of computational resources (e.g., a cluster that includes physical machines, virtual machines or both) can include assigning jobs to computational resources using respective scores for the computational resources that take into account several attributes, including central processing unit (CPU) requirements, memory requirements, and availability. That is, by generating a score that more accurately reflects the likelihood that a given computational resource is the optimal computational resource to place a given job, the resulting job schedule significantly minimizes idle time of the set of computational resources and enhances the throughput of completed jobs.

9.

发明公开
AUTOREGRESSIVELY GENERATING SEQUENCES OF DATA ELEMENTS DEFINING ACTIONS TO BE PERFORMED BY AN AGENT 审中-公开

公开(公告)号：US20240281654A1

公开(公告)日：2024-08-22

申请号：US18292165

申请日：2022-08-12

Applicant: DeepMind Technologies Limited

Inventor： Scott Ellison Reed , Konrad Zolna , Emilio Parisotto , Tom Erez , Alexander Novikov , Jack William Rae , Misha Man Ray Denil , Joao Ferdinando Gomes de Freitas , Oriol Vinyals , Sergio Gomez , Ashley Deloris Edwards , Jacob Bruce , Gabriel Barth-Maron

IPC: G06N3/08 , G06N3/04

CPC classification number: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent to interact with an environment using an action selection neural network. In one aspect, a method comprises, at each time step in a sequence of time steps: generating a current representation of a state of a task being performed by the agent in the environment as of the current time step as a sequence of data elements; autoregressively generating a sequence of data elements representing a current action to be performed by the agent at the current time step; and after autoregressively generating the sequence of data elements representing the current action, causing the agent to perform the current action at the current time step.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification