Patent search ap:("DeepMind Technologies Limited") AND inv:"Tobias Pfaff" Page 1

1.

发明公开
SIMULATING PHYSICAL ENVIRONMENTS USING GRAPH NEURAL NETWORKS 审中-公开

公开(公告)号：US20230359788A1

公开(公告)日：2023-11-09

申请号：US18027174

申请日：2021-10-01

Applicant: DeepMind Technologies Limited

Inventor： Alvaro Sanchez , Jonathan William Godwin , Rex Ying , Tobias Pfaff , Meire Fortunato , Peter William Battaglia

IPC: G06F30/27

CPC classification number: G06F30/27 , G06F2113/08

Abstract: This specification describes a simulation system that performs simulations of physical environments using a graph neural network. At each of one or more time steps in a sequence of time steps, the system can process a representation of a current state of the physical environment at the current time step using the graph neural network to generate a prediction of a next state of the physical environment at the next time step. Some implementations of the system are adapted for hardware GLOBAL acceleration. As well as performing simulations, the system can be used to predict physical quantities based on measured real-world data. Implementations of the system are differentiable and can also be used for design optimization, and for optimal control tasks.

2.

发明申请
OPTIMIZING DESIGN PARAMETERS USING A SIMULATION NEURAL NETWORK 有权

公开(公告)号：US20250103776A1

公开(公告)日：2025-03-27

申请号：US18832787

申请日：2023-01-30

Applicant: DeepMind Technologies Limited

Inventor： Kelsey Rebecca Allen , Tatiana Lopez Guevara , Kimberly Stachenfeld , Jessica Blake Chandler Hamrick , Alvaro Sanchez , Peter William Battaglia , Tobias Pfaff

IPC: G06F30/27 , G06F30/15 , G06N3/042 , G06N3/084

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for optimizing a set of design parameters. In one aspect, a method includes: obtaining a respective initial value for each design parameter, and iteratively optimizing current values of the design parameters over a sequence of optimization iterations. The method further includes, each optimization iteration: generating a representation of an initial state of an environment using the current values of the design parameters, processing an input including the representation of the initial state of the environment using a simulation neural network to generate an output that defines a simulation of the state of the environment over a sequence of one or more time steps, determining a reward, determining gradients of the reward with respect to the current values of the design parameters, and updating the current values of the design parameters using the gradients.

3.

发明授权
Action selection neural network training using imitation learning in latent space 有权

公开(公告)号：US11663441B2

公开(公告)日：2023-05-30

申请号：US16586437

申请日：2019-09-27

Applicant: DeepMind Technologies Limited

Inventor： Scott Ellison Reed , Yusuf Aytar , Ziyu Wang , Tom Paine , Sergio Gomez Colmenarejo , David Budden , Tobias Pfaff , Aaron Gerard Antonius van den Oord , Oriol Vinyals , Alexander Novikov

IPC: G06N3/006 , G06F17/16 , G06N3/08 , G06F18/22 , G06N3/045 , G06N3/048 , G06V10/764 , G06V10/77 , G06V10/82

CPC classification number: G06N3/006 , G06F17/16 , G06F18/22 , G06N3/045 , G06N3/048 , G06N3/08 , G06V10/764 , G06V10/7715 , G06V10/82

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection policy neural network, wherein the action selection policy neural network is configured to process an observation characterizing a state of an environment to generate an action selection policy output, wherein the action selection policy output is used to select an action to be performed by an agent interacting with an environment. In one aspect, a method comprises: obtaining an observation characterizing a state of the environment subsequent to the agent performing a selected action; generating a latent representation of the observation; processing the latent representation of the observation using a discriminator neural network to generate an imitation score; determining a reward from the imitation score; and adjusting the current values of the action selection policy neural network parameters based on the reward using a reinforcement learning training technique.

4.

发明申请
TRAINING ACTION SELECTION NEURAL NETWORKS USING Q-LEARNING COMBINED WITH LOOK AHEAD SEARCH 有权

公开(公告)号：US20220366247A1

公开(公告)日：2022-11-17

申请号：US17763920

申请日：2020-09-23

Applicant: DeepMind Technologies Limited

Inventor： Jessica Blake Chandler Hamrick , Victor Constant Bapst , Alvaro Sanchez , Tobias Pfaff , Theophane Guillaume Weber , Lars Buesing , Peter William Battaglia

IPC: G06N3/08 , G06N3/04

Abstract: A reinforcement learning system and method that selects actions to be performed by an agent interacting with an environment. The system uses a combination of reinforcement learning and a look ahead search: Reinforcement learning Q-values are used to guide the look ahead search and the search is used in turn to improve the Q-values. The system learns from a combination of real experience and simulated, model-based experience.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification