-
公开(公告)号:US20230359788A1
公开(公告)日:2023-11-09
申请号:US18027174
申请日:2021-10-01
Applicant: DeepMind Technologies Limited
Inventor: Alvaro Sanchez , Jonathan William Godwin , Rex Ying , Tobias Pfaff , Meire Fortunato , Peter William Battaglia
IPC: G06F30/27
CPC classification number: G06F30/27 , G06F2113/08
Abstract: This specification describes a simulation system that performs simulations of physical environments using a graph neural network. At each of one or more time steps in a sequence of time steps, the system can process a representation of a current state of the physical environment at the current time step using the graph neural network to generate a prediction of a next state of the physical environment at the next time step. Some implementations of the system are adapted for hardware GLOBAL acceleration. As well as performing simulations, the system can be used to predict physical quantities based on measured real-world data. Implementations of the system are differentiable and can also be used for design optimization, and for optimal control tasks.
-
公开(公告)号:US20250103776A1
公开(公告)日:2025-03-27
申请号:US18832787
申请日:2023-01-30
Applicant: DeepMind Technologies Limited
Inventor: Kelsey Rebecca Allen , Tatiana Lopez Guevara , Kimberly Stachenfeld , Jessica Blake Chandler Hamrick , Alvaro Sanchez , Peter William Battaglia , Tobias Pfaff
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for optimizing a set of design parameters. In one aspect, a method includes: obtaining a respective initial value for each design parameter, and iteratively optimizing current values of the design parameters over a sequence of optimization iterations. The method further includes, each optimization iteration: generating a representation of an initial state of an environment using the current values of the design parameters, processing an input including the representation of the initial state of the environment using a simulation neural network to generate an output that defines a simulation of the state of the environment over a sequence of one or more time steps, determining a reward, determining gradients of the reward with respect to the current values of the design parameters, and updating the current values of the design parameters using the gradients.
-
公开(公告)号:US11663441B2
公开(公告)日:2023-05-30
申请号:US16586437
申请日:2019-09-27
Applicant: DeepMind Technologies Limited
Inventor: Scott Ellison Reed , Yusuf Aytar , Ziyu Wang , Tom Paine , Sergio Gomez Colmenarejo , David Budden , Tobias Pfaff , Aaron Gerard Antonius van den Oord , Oriol Vinyals , Alexander Novikov
IPC: G06N3/006 , G06F17/16 , G06N3/08 , G06F18/22 , G06N3/045 , G06N3/048 , G06V10/764 , G06V10/77 , G06V10/82
CPC classification number: G06N3/006 , G06F17/16 , G06F18/22 , G06N3/045 , G06N3/048 , G06N3/08 , G06V10/764 , G06V10/7715 , G06V10/82
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection policy neural network, wherein the action selection policy neural network is configured to process an observation characterizing a state of an environment to generate an action selection policy output, wherein the action selection policy output is used to select an action to be performed by an agent interacting with an environment. In one aspect, a method comprises: obtaining an observation characterizing a state of the environment subsequent to the agent performing a selected action; generating a latent representation of the observation; processing the latent representation of the observation using a discriminator neural network to generate an imitation score; determining a reward from the imitation score; and adjusting the current values of the action selection policy neural network parameters based on the reward using a reinforcement learning training technique.
-
4.
公开(公告)号:US20220366247A1
公开(公告)日:2022-11-17
申请号:US17763920
申请日:2020-09-23
Applicant: DeepMind Technologies Limited
Inventor: Jessica Blake Chandler Hamrick , Victor Constant Bapst , Alvaro Sanchez , Tobias Pfaff , Theophane Guillaume Weber , Lars Buesing , Peter William Battaglia
Abstract: A reinforcement learning system and method that selects actions to be performed by an agent interacting with an environment. The system uses a combination of reinforcement learning and a look ahead search: Reinforcement learning Q-values are used to guide the look ahead search and the search is used in turn to improve the Q-values. The system learns from a combination of real experience and simulated, model-based experience.
-
-
-