-
1.
公开(公告)号:US20220366247A1
公开(公告)日:2022-11-17
申请号:US17763920
申请日:2020-09-23
Applicant: DeepMind Technologies Limited
Inventor: Jessica Blake Chandler Hamrick , Victor Constant Bapst , Alvaro Sanchez , Tobias Pfaff , Theophane Guillaume Weber , Lars Buesing , Peter William Battaglia
Abstract: A reinforcement learning system and method that selects actions to be performed by an agent interacting with an environment. The system uses a combination of reinforcement learning and a look ahead search: Reinforcement learning Q-values are used to guide the look ahead search and the search is used in turn to improve the Q-values. The system learns from a combination of real experience and simulated, model-based experience.
-
公开(公告)号:US20250103776A1
公开(公告)日:2025-03-27
申请号:US18832787
申请日:2023-01-30
Applicant: DeepMind Technologies Limited
Inventor: Kelsey Rebecca Allen , Tatiana Lopez Guevara , Kimberly Stachenfeld , Jessica Blake Chandler Hamrick , Alvaro Sanchez , Peter William Battaglia , Tobias Pfaff
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for optimizing a set of design parameters. In one aspect, a method includes: obtaining a respective initial value for each design parameter, and iteratively optimizing current values of the design parameters over a sequence of optimization iterations. The method further includes, each optimization iteration: generating a representation of an initial state of an environment using the current values of the design parameters, processing an input including the representation of the initial state of the environment using a simulation neural network to generate an output that defines a simulation of the state of the environment over a sequence of one or more time steps, determining a reward, determining gradients of the reward with respect to the current values of the design parameters, and updating the current values of the design parameters using the gradients.
-