Patent search ap:("DEEPMIND TECHNOLOGIES LIMITED") AND inv:"BANINO Page Andrea"

1.

发明申请
NEURAL NETWORK-BASED MEMORY SYSTEM WITH VARIABLE RECIRCULATION OF QUERIES USING MEMORY CONTENT 审中-公开

公开(公告)号：WO2020234457A1

公开(公告)日：2020-11-26

申请号：PCT/EP2020/064286

申请日：2020-05-22

Applicant: DEEPMIND TECHNOLOGIES LIMITED

Inventor： BANINO, Andrea , BLUNDELL, Charles , BADIA, Adrià Puigdomènech , KOSTER, Raphael , KUMARAN, Sudarshan

IPC: G06N3/04 , G06N3/08 , G06N7/00 , G06N3/00

Abstract: A neural network based memory system with external memory for storing representations of knowledge items. The memory can be used to retrieve indirectly related knowledge items by recirculating queries, and is useful for relational reasoning. Implementations of the system control how many times queries are recirculated, and hence the degree of relational reasoning, to minimize computation.

2.

发明申请
ATTENTION NEURAL NETWORKS WITH SHORT-TERM MEMORY UNITS 审中-公开

公开(公告)号：WO2022167657A2

公开(公告)日：2022-08-11

申请号：PCT/EP2022/052893

申请日：2022-02-07

Applicant: DEEPMIND TECHNOLOGIES LIMITED

Inventor： BANINO, Andrea , BADIA, Adrià Puigdomènech , WALKER, Jacob Charles , MITROVIC, Jovana , BLUNDELL, Charles , SCHOLTES, Timothy Anthony Julian

IPC: G06N3/00 , G06N3/04 , G06N3/08 , G06N3/006 , G06N3/0445 , G06N3/0454

Abstract: A system for controlling an agent interacting with an environment to perform a task. The system includes an action selection neural network configured to generate action selection outputs that are used to select actions to be performed by the agent. The action selection neural network includes an encoder sub network configured to generate encoded representations of the current observations; an attention sub network configured to generate attention sub network outputs with the used of an attention mechanism; a recurrent sub network configured to generate recurrent sub network outputs; and an action selection sub network configured to generate the action selection outputs that are used to select the actions to be performed by the agent in response to the current observations.

3.

发明申请
PERFORMING NAVIGATION TASKS USING GRID CODES 审中-公开

公开(公告)号：WO2019215269A1

公开(公告)日：2019-11-14

申请号：PCT/EP2019/061890

申请日：2019-05-09

Applicant: DEEPMIND TECHNOLOGIES LIMITED

Inventor： BANINO, Andrea , KUMARAN, Sudarshan , HADSELL, Raia Thais , URIA-MARTINEZ, Benigno

IPC: G06N3/00 , G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a grid cell neural network and an action selection neural network. The grid cell network is configured to: receive an input comprising data characterizing a velocity of the agent; process the input to generate a grid cell representation; and process the grid cell representation to generate an estimate of a position of the agent in the environment; the action selection neural network is configured to: receive an input comprising a grid cell representation and an observation characterizing a state of the environment; and process the input to generate an action selection network output.

4.

发明申请
RETRIEVAL AUGMENTED REINFORCEMENT LEARNING 审中-公开

公开(公告)号：WO2023057512A1

公开(公告)日：2023-04-13

申请号：PCT/EP2022/077696

申请日：2022-10-05

Applicant: DEEPMIND TECHNOLOGIES LIMITED

Inventor： GOYAL, Anirudh , BANINO, Andrea , FRIESEN, Abram Luke , WEBER, Theophane Guillaume , BADIA, Adrià Puigdomènech , KE, Nan , OSINDERO, Simon , LILLICRAP, Timothy Paul , BLUNDELL, Charles

IPC: G06N3/092 , G06N3/006 , G06N3/042 , G06N3/0442 , G06N3/0455 , G06N3/0464 , G06N3/084

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling a reinforcement learning agent in an environment to perform a task using a retrieval-augmented action selection process. One of the methods includes receiving a current observation characterizing a current state of the environment; processing an encoder network input comprising the current observation to determine a policy neural network hidden state that corresponds to the current observation; maintaining a plurality of trajectories generated as a result of the reinforcement learning agent interacting with the environment; selecting one or more trajectories from the plurality of trajectories; updating the policy neural network hidden state using update data determined from the one or more selected trajectories; and processing the updated hidden state using a policy neural network to generate a policy output that specifies an action to be performed by the agent in response to the current observation.

5.

发明申请
GENERATING SPATIAL EMBEDDINGS BY INTEGRATING AGENT MOTION AND OPTIMIZING A PREDICTIVE OBJECTIVE 审中-公开

公开(公告)号：WO2021228985A1

公开(公告)日：2021-11-18

申请号：PCT/EP2021/062704

申请日：2021-05-12

Applicant: DEEPMIND TECHNOLOGIES LIMITED

Inventor： URIA-MARTÍNEZ, Benigno , BANINO, Andrea , IBARZ GABARDOS, Borja , ZAMBALDI, Vinicius , BLUNDELL, Charles

IPC: G06N3/00 , G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a spatial embedding neural network that is configured to process data characterizing motion of an agent that is interacting with an environment to generate spatial embeddings. In one aspect, a method comprises: processing data characterizing the motion of the agent in the environment at the current time step using a spatial embedding neural network to generate a current spatial embedding for the current time step; determining a predicted score and a target score for each of a plurality of slots in an external memory, wherein each slot stores: (i) a representation of an observation characterizing a state of the environment, and (ii) a spatial embedding; and determining an update to values of the set of spatial embedding neural network parameters based on an error between the predicted scores and the target scores.

6.

发明申请
ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING 审中-公开
Title translation: 采用强化学习的环境导航

公开(公告)号：WO2018083672A1

公开(公告)日：2018-05-11

申请号：PCT/IB2017/056907

申请日：2017-11-04

Applicant: DEEPMIND TECHNOLOGIES LIMITED

Inventor： VIOLA, Fabio , MIROWSKI, Piotr Wojciech , BANINO, Andrea , PASCANU, Razvan , SOYER, Hubert Josef , BALLARD, Andrew James , KUMARAN, Sudarshan , HADSELL, Raia Thais , SIFRE, Laurent , GOROSHIN, Rostislav , KAVUKCUOGLU, Koray , DENIL, Misha Man Ray

IPC: G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters.

Abstract translation: 包括编码在计算机存储介质上的用于训练强化学习系统的计算机程序的方法，系统和装置。在一个方面，一种训练动作选择策略神经网络的方法用于选择要通过在环境中导航以实现一个或多个目标的代理执行的动作，包括：接收表征环境的当前状态的观察图像; 使用动作选择策略神经网络处理包括观察图像的输入以生成动作选择输出; 使用几何预测神经网络处理由动作选择策略神经网络产生的中间输出以预测当处于当前状态时环境的几何特征的值; 以及将基于几何的辅助损失的梯度反向传播到动作选择策略神经网络中以确定针对网络参数的当前值的基于几何的辅助更新。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification