Patent search ap:("DEEPMIND TECHNOLOGIES LIMITED") AND inv:"Misha Man Ray Denil" Page 1

1.

发明申请
PROGRAMMABLE REINFORCEMENT LEARNING SYSTEMS 有权

公开(公告)号：US20240394504A1

公开(公告)日：2024-11-28

申请号：US18637279

申请日：2024-04-16

Applicant: DeepMind Technologies Limited

Inventor： Misha Man Ray Denil , Sergio Gomez Colmenarejo , Serkan Cabi , David William Saxton , Joao Ferdinando Gomes de Freitas

IPC: G06N3/006 , G06F18/21 , G06F18/2451 , G06N3/045 , G06N3/047 , G06N3/084

Abstract: A reinforcement learning system is proposed comprising a plurality of property detector neural networks. Each property detector neural network is arranged to receive data representing an object within an environment, and to generate property data associated with a property of the object. A processor is arranged to receive an instruction indicating a task associated with an object having an associated property, and process the output of the plurality of property detector neural networks based upon the instruction to generate a relevance data item. The relevance data item indicates objects within the environment associated with the task. The processor also generates a plurality of weights based upon the relevance data item, and, based on the weights, generates modified data representing the plurality of objects within the environment. A neural network is arranged to receive the modified data and to output an action associated with the task.

2.

发明公开
AUTOREGRESSIVELY GENERATING SEQUENCES OF DATA ELEMENTS DEFINING ACTIONS TO BE PERFORMED BY AN AGENT 审中-公开

公开(公告)号：US20240281654A1

公开(公告)日：2024-08-22

申请号：US18292165

申请日：2022-08-12

Applicant: DeepMind Technologies Limited

Inventor： Scott Ellison Reed , Konrad Zolna , Emilio Parisotto , Tom Erez , Alexander Novikov , Jack William Rae , Misha Man Ray Denil , Joao Ferdinando Gomes de Freitas , Oriol Vinyals , Sergio Gomez , Ashley Deloris Edwards , Jacob Bruce , Gabriel Barth-Maron

IPC: G06N3/08 , G06N3/04

CPC classification number: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent to interact with an environment using an action selection neural network. In one aspect, a method comprises, at each time step in a sequence of time steps: generating a current representation of a state of a task being performed by the agent in the environment as of the current time step as a sequence of data elements; autoregressively generating a sequence of data elements representing a current action to be performed by the agent at the current time step; and after autoregressively generating the sequence of data elements representing the current action, causing the agent to perform the current action at the current time step.

3.

发明公开
TRAINING MACHINE LEARNING MODELS BY DETERMINING UPDATE RULES USING NEURAL NETWORKS 审中-公开

公开(公告)号：US20230376771A1

公开(公告)日：2023-11-23

申请号：US18180754

申请日：2023-03-08

Applicant: DeepMind Technologies Limited

Inventor： Misha Man Ray Denil , Tom Schaul , Marcin Andrychowicz , Joao Ferdinando Gomes de Freitas , Sergio Gomez Colmenarejo , Matthew William Hoffman , David Benjamin Pfau

IPC: G06N3/084 , G06N3/044 , G06N3/045

CPC classification number: G06N3/084 , G06N3/044 , G06N3/045

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for training machine learning models. One method includes obtaining a machine learning model, wherein the machine learning model comprises one or more model parameters, and the machine learning model is trained using gradient descent techniques to optimize an objective function; determining an update rule for the model parameters using a recurrent neural network (RNN); and applying a determined update rule for a final time step in a sequence of multiple time steps to the model parameters.

4.

发明授权
Environment navigation using reinforcement learning 有权

公开(公告)号：US11074481B2

公开(公告)日：2021-07-27

申请号：US16745757

申请日：2020-01-17

Applicant: DeepMind Technologies Limited

Inventor： Fabio Viola , Piotr Wojciech Mirowski , Andrea Banino , Razvan Pascanu , Hubert Josef Soyer , Andrew James Ballard , Sudarshan Kumaran , Raia Thais Hadsell , Laurent Sifre , Rostislav Goroshin , Koray Kavukcuoglu , Misha Man Ray Denil

IPC: G06K9/00 , G06K9/62 , G06N3/04 , G06N3/08 , G06N3/00 , G06T7/50 , G06T7/70

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters.

5.

发明申请
ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING 审中-公开

公开(公告)号：US20190266449A1

公开(公告)日：2019-08-29

申请号：US16403343

申请日：2019-05-03

Applicant: DeepMind Technologies Limited

Inventor： Fabio Viola , Piotr Wojciech Mirowski , Andrea Banino , Razvan Pascanu , Hubert Josef Soyer , Andrew James Ballard , Sudarshan Kumaran , Raia Thais Hadsell , Laurent Sifre , Rostislav Goroshin , Koray Kavukcuoglu , Misha Man Ray Denil

IPC: G06K9/62 , G06N3/08 , G06N3/04 , G06K9/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters.

6.

发明授权
Training machine learning models by determining update rules using recurrent neural networks 有权

公开(公告)号：US11615310B2

公开(公告)日：2023-03-28

申请号：US16302592

申请日：2017-05-19

Applicant: DEEPMIND TECHNOLOGIES LIMITED

Inventor： Misha Man Ray Denil , Tom Schaul , Marcin Andrychowicz , Joao Ferdinando Gomes de Freitas , Sergio Gomez Colmenarejo , Matthew William Hoffman , David Benjamin Pfau

IPC: G06N3/08 , G06N3/04 , G06N3/084

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for training machine learning models. One method includes obtaining a machine learning model, wherein the machine learning model comprises one or more model parameters, and the machine learning model is trained using gradient descent techniques to optimize an objective function; determining an update rule for the model parameters using a recurrent neural network (RNN); and applying a determined update rule for a final time step in a sequence of multiple time steps to the model parameters.

7.

发明申请
PROGRAMMABLE REINFORCEMENT LEARNING SYSTEMS 审中-公开

公开(公告)号：US20200167633A1

公开(公告)日：2020-05-28

申请号：US16615061

申请日：2018-05-22

Applicant: DEEPMIND TECHNOLOGIES LIMITED

Inventor： Misha Man Ray Denil , Sergio Gomez Colmenarejo , Serkan Cabi , David William Saxton , Joao Ferdinando Gomes de Freitas

IPC: G06N3/04 , G06N3/08 , G06K9/62

Abstract: A reinforcement learning system is proposed comprising a plurality of property detector neural networks. Each property detector neural network is arranged to receive data representing an object within an environment, and to generate property data associated with a property of the object. A processor is arranged to receive an instruction indicating a task associated with an object having an associated property, and process the output of the plurality of property detector neural networks based upon the instruction to generate a relevance data item. The relevance data item indicates objects within the environment associated with the task. The processor also generates a plurality of weights based upon the relevance data item, and, based on the weights, generates modified data representing the plurality of objects within the environment. A neural network is arranged to receive the modified data and to output an action associated with the task.

8.

发明申请
ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING 审中-公开

公开(公告)号：US20200151515A1

公开(公告)日：2020-05-14

申请号：US16745757

申请日：2020-01-17

Applicant: DeepMind Technologies Limited

Inventor： Fabio Viola , Piotr Wojciech Mirowski , Andrea Banino , Razvan Pascanu , Hubert Josef Soyer , Andrew James Ballard , Sudarshan Kumaran , Raia Thais Hadsell , Laurent Sifre , Rostislav Goroshin , Koray Kavukcuoglu , Misha Man Ray Denil

IPC: G06K9/62 , G06N3/00 , G06N3/08 , G06N3/04 , G06K9/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters.

9.

发明授权
Training machine learning models by determining update rules using neural networks 有权

公开(公告)号：US12271823B2

公开(公告)日：2025-04-08

申请号：US18180754

申请日：2023-03-08

Applicant: DeepMind Technologies Limited

Inventor： Misha Man Ray Denil , Tom Schaul , Marcin Andrychowicz , Joao Ferdinando Gomes de Freitas , Sergio Gomez Colmenarejo , Matthew William Hoffman , David Benjamin Pfau

IPC: G06N3/084 , G06N3/044 , G06N3/045

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for training machine learning models. One method includes obtaining a machine learning model, wherein the machine learning model comprises one or more model parameters, and the machine learning model is trained using gradient descent techniques to optimize an objective function; determining an update rule for the model parameters using a recurrent neural network (RNN); and applying a determined update rule for a final time step in a sequence of multiple time steps to the model parameters.

10.

发明公开
INTERACTIVE DECODING OF WORDS FROM PHONEME SCORE DISTRIBUTIONS 审中-公开

公开(公告)号：US20240185842A1

公开(公告)日：2024-06-06

申请号：US18285345

申请日：2022-04-07

Applicant: DeepMind Technologies Limited

Inventor： Ioannis Alexandros Assael , Brendan Shillingford , Misha Man Ray Denil

IPC: G10L15/16 , G06V40/20 , G10L15/187

CPC classification number: G10L15/16 , G06V40/20 , G10L15/187

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for interactive decoding of a word sequence.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification