Patent search ap:("DeepMind Technologies Limited") AND inv:"Victor Constant Bapst" Page 1

1.

发明公开
REINFORCEMENT LEARNING USING A RELATIONAL NETWORK FOR GENERATING DATA ENCODING RELATIONSHIPS BETWEEN ENTITIES IN AN ENVIRONMENT 审中-公开

公开(公告)号：US20230196146A1

公开(公告)日：2023-06-22

申请号：US18168123

申请日：2023-02-13

Applicant: DeepMind Technologies Limited

Inventor： Yujia Li , Victor Constant Bapst , Vinicius Zambaldi , David Nunes Raposo , Adam Anthony Santoro

IPC: G06N5/043 , G06F17/16 , G06N3/04 , G06N3/08 , G06N7/01

CPC classification number: G06N5/043 , G06F17/16 , G06N3/04 , G06N3/08 , G06N7/01

Abstract: A neural network system is proposed, including an input network for extracting, from state data, respective entity data for each a plurality of entities which are present, or at least potentially present, in the environment. The entity data describes the entity. The neural network contains a relational network for parsing this data, which includes one or more attention blocks which may be stacked to perform successive actions on the entity data. The attention blocks each include a respective transform network for each of the entities. The transform network for each entity is able to transform data which the transform network receives for the entity into modified entity data for the entity, based on data for a plurality of the other entities. An output network is arranged to receive data output by the relational network, and use the received data to select a respective action.

2.

发明申请
PREDICTING PROPERTIES OF MATERIALS FROM PHYSICAL MATERIAL STRUCTURES 有权

公开(公告)号：US20210334655A1

公开(公告)日：2021-10-28

申请号：US17240554

申请日：2021-04-26

Applicant: DeepMind Technologies Limited

Inventor： Annette Ada Nkechinyere Obika , Tian Xie , Victor Constant Bapst , Alexander Lloyd Gaunt , James Kirkpatrick

IPC: G06N3/08 , G06N3/04

Abstract: Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for predicting one or more properties of a material. One of the methods includes maintaining data specifying a set of known materials each having a respective known physical structure; receiving data specifying a new material; identifying a plurality of known materials in the set of known materials that are similar to the new material; determining a predicted embedding of the new material from at least respective embeddings corresponding to each of the similar known materials; and processing the predicted embedding of the new material using an experimental prediction neural network to predict one or more properties of the new material.

3.

发明申请
TRAINING ACTION SELECTION NEURAL NETWORKS USING OFF-POLICY ACTOR CRITIC REINFORCEMENT LEARNING AND STOCHASTIC DUELING NEURAL NETWORKS 有权

公开(公告)号：US20250094772A1

公开(公告)日：2025-03-20

申请号：US18962266

申请日：2024-11-27

Applicant: DeepMind Technologies Limited

Inventor： Ziyu Wang , Nicolas Manfred Otto Heess , Victor Constant Bapst

IPC: G06N3/045 , G06N3/006 , G06N3/047 , G06N3/084 , G06N3/088

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.

4.

发明申请
REINFORCEMENT LEARNING USING A RELATIONAL NETWORK FOR GENERATING DATA ENCODING RELATIONSHIPS BETWEEN ENTITIES IN AN ENVIRONMENT 审中-公开

公开(公告)号：US20190354885A1

公开(公告)日：2019-11-21

申请号：US16417580

申请日：2019-05-20

Applicant: DeepMind Technologies Limited

Inventor： Yujia Li , Victor Constant Bapst , Vinicius Zambaldi , David Nunes Raposo , Adam Anthony Santoro

IPC: G06N5/04 , G06N3/04 , G06N3/08 , G06F17/16 , G06N7/00

Abstract: A neural network system is proposed, including an input network for extracting, from state data, respective entity data for each a plurality of entities which are present, or at least potentially present, in the environment. The entity data describes the entity. The neural network contains a relational network for parsing this data, which includes one or more attention blocks which may be stacked to perform successive actions on the entity data. The attention blocks each include a respective transform network for each of the entities. The transform network for each entity is able to transform data which the transform network receives for the entity into modified entity data for the entity, based on data for a plurality of the other entities. An output network is arranged to receive data output by the relational network, and use the received data to select a respective action.

5.

发明授权
Multi-task neural network systems with task-specific policies and a shared policy 有权

公开(公告)号：US11983634B2

公开(公告)日：2024-05-14

申请号：US17486842

申请日：2021-09-27

Applicant: DeepMind Technologies Limited

Inventor： Razvan Pascanu , Raia Thais Hadsell , Victor Constant Bapst , Wojciech Czarnecki , James Kirkpatrick , Yee Whye Teh , Nicolas Manfred Otto Heess

IPC: G06N3/08 , G06N3/084 , G06N3/10 , G06N5/043

CPC classification number: G06N3/084 , G06N3/10 , G06N5/043

Abstract: A method is proposed for training a multitask computer system, such as a multitask neural network system. The system comprises a set of trainable workers and a shared module. The trainable workers and shared module are trained on a plurality of different tasks, such that each worker learns to perform a corresponding one of the tasks according to a respective task policy, and said shared policy network learns a multitask policy which represents common behavior for the tasks. The coordinated training is performed by optimizing an objective function comprising, for each task: a reward term indicative of an expected reward earned by a worker in performing the corresponding task according to the task policy; and at least one entropy term which regularizes the distribution of the task policy towards the distribution of the multitask policy.

6.

发明授权
Reinforcement learning using a relational network for generating data encoding relationships between entities in an environment 有权

公开(公告)号：US11580429B2

公开(公告)日：2023-02-14

申请号：US16417580

申请日：2019-05-20

Applicant: DeepMind Technologies Limited

Inventor： Yujia Li , Victor Constant Bapst , Vinicius Zambaldi , David Nunes Raposo , Adam Anthony Santoro

IPC: G06N5/04 , G06N3/04 , G06N3/08 , G06N7/00 , G06F17/16 , G06N5/043

Abstract: A neural network system is proposed, including an input network for extracting, from state data, respective entity data for each a plurality of entities which are present, or at least potentially present, in the environment. The entity data describes the entity. The neural network contains a relational network for parsing this data, which includes one or more attention blocks which may be stacked to perform successive actions on the entity data. The attention blocks each include a respective transform network for each of the entities. The transform network for each entity is able to transform data which the transform network receives for the entity into modified entity data for the entity, based on data for a plurality of the other entities. An output network is arranged to receive data output by the relational network, and use the received data to select a respective action.

7.

发明申请
TRAINING ACTION SELECTION NEURAL NETWORKS USING OFF-POLICY ACTOR CRITIC REINFORCEMENT LEARNING 审中-公开

公开(公告)号：US20200293862A1

公开(公告)日：2020-09-17

申请号：US16885918

申请日：2020-05-28

Applicant: DeepMind Technologies Limited

Inventor： Ziyu Wang , Nicolas Manfred Otto Heess , Victor Constant Bapst

IPC: G06N3/04 , G06N3/08 , G06N3/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.

8.

发明授权
Training action selection neural networks using off-policy actor critic reinforcement learning and stochastic dueling neural networks 有权

公开(公告)号：US12190223B2

公开(公告)日：2025-01-07

申请号：US16885918

申请日：2020-05-28

Applicant: DeepMind Technologies Limited

Inventor： Ziyu Wang , Nicolas Manfred Otto Heess , Victor Constant Bapst

IPC: G06N3/045 , G06N3/006 , G06N3/047 , G06N3/084 , G06N3/088

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.

9.

发明申请
MULTI-TASK NEURAL NETWORK SYSTEMS WITH TASK-SPECIFIC POLICIES AND A SHARED POLICY 审中-公开

公开(公告)号：US20200090048A1

公开(公告)日：2020-03-19

申请号：US16689020

申请日：2019-11-19

Applicant: DeepMind Technologies Limited

Inventor： Razvan Pascanu , Raia Thais Hadsell , Victor Constant Bapst , Wojciech Czarnecki , James Kirkpatrick , Yee Whye Teh , Nicolas Manfred Otto Heess

IPC: G06N3/08 , G06N3/10 , G06N5/04

Abstract: A method is proposed for training a multitask computer system, such as a multitask neural network system. The system comprises a set of trainable workers and a shared module. The trainable workers and shared module are trained on a plurality of different tasks, such that each worker learns to perform a corresponding one of the tasks according to a respective task policy, and said shared policy network learns a multitask policy which represents common behavior for the tasks. The coordinated training is performed by optimizing an objective function comprising, for each task: a reward term indicative of an expected reward earned by a worker in performing the corresponding task according to the task policy; and at least one entropy term which regularizes the distribution of the task policy towards the distribution of the multitask policy.

10.

发明申请
TRAINING ACTION SELECTION NEURAL NETWORKS 审中-公开

公开(公告)号：US20190258918A1

公开(公告)日：2019-08-22

申请号：US16402687

申请日：2019-05-03

Applicant: DeepMind Technologies Limited

Inventor： Ziyu Wang , Nicolas Manfred Otto Heess , Victor Constant Bapst

IPC: G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification