-
公开(公告)号:US12190236B2
公开(公告)日:2025-01-07
申请号:US17240554
申请日:2021-04-26
Applicant: DeepMind Technologies Limited
Inventor: Annette Ada Nkechinyere Obika , Tian Xie , Victor Constant Bapst , Alexander Lloyd Gaunt , James Kirkpatrick
Abstract: Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for predicting one or more properties of a material. One of the methods includes maintaining data specifying a set of known materials each having a respective known physical structure; receiving data specifying a new material; identifying a plurality of known materials in the set of known materials that are similar to the new material; determining a predicted embedding of the new material from at least respective embeddings corresponding to each of the similar known materials; and processing the predicted embedding of the new material using an experimental prediction neural network to predict one or more properties of the new material.
-
12.
公开(公告)号:US20220366247A1
公开(公告)日:2022-11-17
申请号:US17763920
申请日:2020-09-23
Applicant: DeepMind Technologies Limited
Inventor: Jessica Blake Chandler Hamrick , Victor Constant Bapst , Alvaro Sanchez , Tobias Pfaff , Theophane Guillaume Weber , Lars Buesing , Peter William Battaglia
Abstract: A reinforcement learning system and method that selects actions to be performed by an agent interacting with an environment. The system uses a combination of reinforcement learning and a look ahead search: Reinforcement learning Q-values are used to guide the look ahead search and the search is used in turn to improve the Q-values. The system learns from a combination of real experience and simulated, model-based experience.
-
公开(公告)号:US20220083869A1
公开(公告)日:2022-03-17
申请号:US17486842
申请日:2021-09-27
Applicant: DeepMind Technologies Limited
Inventor: Razvan Pascanu , Raia Thais Hadsell , Victor Constant Bapst , Wojciech Czarnecki , James Kirkpatrick , Yee Whye Teh , Nicolas Manfred Otto Heess
Abstract: A method is proposed for training a multitask computer system, such as a multitask neural network system. The system comprises a set of trainable workers and a shared module. The trainable workers and shared module are trained on a plurality of different tasks, such that each worker learns to perform a corresponding one of the tasks according to a respective task policy, and said shared policy network learns a multitask policy which represents common behavior for the tasks. The coordinated training is performed by optimizing an objective function comprising, for each task: a reward term indicative of an expected reward earned by a worker in performing the corresponding task according to the task policy; and at least one entropy term which regularizes the distribution of the task policy towards the distribution of the multitask policy.
-
公开(公告)号:US11132609B2
公开(公告)日:2021-09-28
申请号:US16689020
申请日:2019-11-19
Applicant: DeepMind Technologies Limited
Inventor: Razvan Pascanu , Raia Thais Hadsell , Victor Constant Bapst , Wojciech Czarnecki , James Kirkpatrick , Yee Whye Teh , Nicolas Manfred Otto Heess
Abstract: A method is proposed for training a multitask computer system, such as a multitask neural network system. The system comprises a set of trainable workers and a shared module. The trainable workers and shared module are trained on a plurality of different tasks, such that each worker learns to perform a corresponding one of the tasks according to a respective task policy, and said shared policy network learns a multitask policy which represents common behavior for the tasks. The coordinated training is performed by optimizing an objective function comprising, for each task: a reward term indicative of an expected reward earned by a worker in performing the corresponding task according to the task policy; and at least one entropy term which regularizes the distribution of the task policy towards the distribution of the multitask policy.
-
15.
公开(公告)号:US10706352B2
公开(公告)日:2020-07-07
申请号:US16402687
申请日:2019-05-03
Applicant: DeepMind Technologies Limited
Inventor: Ziyu Wang , Nicolas Manfred Otto Heess , Victor Constant Bapst
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.
-
-
-
-