Patent search ap:("DeepMind Technologies Limited") AND inv:"Jessica Blake Chandler Hamrick" Page 1

1.

发明申请
TRAINING ACTION SELECTION NEURAL NETWORKS USING Q-LEARNING COMBINED WITH LOOK AHEAD SEARCH 有权

公开(公告)号：US20220366247A1

公开(公告)日：2022-11-17

申请号：US17763920

申请日：2020-09-23

Applicant: DeepMind Technologies Limited

Inventor： Jessica Blake Chandler Hamrick , Victor Constant Bapst , Alvaro Sanchez , Tobias Pfaff , Theophane Guillaume Weber , Lars Buesing , Peter William Battaglia

IPC: G06N3/08 , G06N3/04

Abstract: A reinforcement learning system and method that selects actions to be performed by an agent interacting with an environment. The system uses a combination of reinforcement learning and a look ahead search: Reinforcement learning Q-values are used to guide the look ahead search and the search is used in turn to improve the Q-values. The system learns from a combination of real experience and simulated, model-based experience.

2.

发明申请
OPTIMIZING DESIGN PARAMETERS USING A SIMULATION NEURAL NETWORK 有权

公开(公告)号：US20250103776A1

公开(公告)日：2025-03-27

申请号：US18832787

申请日：2023-01-30

Applicant: DeepMind Technologies Limited

Inventor： Kelsey Rebecca Allen , Tatiana Lopez Guevara , Kimberly Stachenfeld , Jessica Blake Chandler Hamrick , Alvaro Sanchez , Peter William Battaglia , Tobias Pfaff

IPC: G06F30/27 , G06F30/15 , G06N3/042 , G06N3/084

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for optimizing a set of design parameters. In one aspect, a method includes: obtaining a respective initial value for each design parameter, and iteratively optimizing current values of the design parameters over a sequence of optimization iterations. The method further includes, each optimization iteration: generating a representation of an initial state of an environment using the current values of the design parameters, processing an input including the representation of the initial state of the environment using a simulation neural network to generate an output that defines a simulation of the state of the environment over a sequence of one or more time steps, determining a reward, determining gradients of the reward with respect to the current values of the design parameters, and updating the current values of the design parameters using the gradients.

Patent Agency Ranking