Patent search ap:("DeepMind Technologies Limited") AND inv:"Gregory Duncan Wayne" Page 2

11.

发明授权
Selecting reinforcement learning actions using a low-level controller 有权

公开(公告)号：US11875258B1

公开(公告)日：2024-01-16

申请号：US17541186

申请日：2021-12-02

Applicant: DeepMind Technologies Limited

Inventor： Nicolas Manfred Otto Heess , Timothy Paul Lillicrap , Gregory Duncan Wayne , Yuval Tassa

IPC: G06N3/08 , G06N3/006 , G06N3/044 , G06N3/045

CPC classification number: G06N3/08 , G06N3/006 , G06N3/044 , G06N3/045

Abstract: Methods, systems, and apparatus for selecting actions to be performed by an agent interacting with an environment. One system includes a high-level controller neural network, low-level controller network, and subsystem. The high-level controller neural network receives an input observation and processes the input observation to generate a high-level output defining a control signal for the low-level controller. The low-level controller neural network receives a designated component of an input observation and processes the designated component and an input control signal to generate a low-level output that defines an action to be performed by the agent in response to the input observation. The subsystem receives a current observation characterizing a current state of the environment, determines whether criteria are satisfied for generating a new control signal, and based on the determination, provides appropriate inputs to the high-level and low-level controllers for selecting an action to be performed by the agent.

12.

发明申请
MEMORY AUGMENTED GENERATIVE TEMPORAL MODELS 有权

公开(公告)号：US20210089968A1

公开(公告)日：2021-03-25

申请号：US17113669

申请日：2020-12-07

Applicant: DeepMind Technologies Limited

Inventor： Gregory Duncan Wayne , Chia-Chun Hung , Mevlana Celaleddin Gemici , Adam Anthony Santoro

IPC: G06N20/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating sequences of predicted observations, for example images. In one aspect, a system comprises a controller recurrent neural network, and a decoder neural network to process a set of latent variables to generate an observation. An external memory and a memory interface subsystem is configured to, for each of a plurality of time steps, receive an updated hidden state from the controller, generate a memory context vector by reading data from the external memory using the updated hidden state, determine a set of latent variables from the memory context vector, generate a predicted observation by providing the set of latent variables to the decoder neural network, write data to the external memory using the latent variables, the updated hidden state, or both, and generate a controller input for a subsequent time step from the latent variables.

13.

发明申请
CONTROLLING AGENTS OVER LONG TIME SCALES USING TEMPORAL VALUE TRANSPORT 有权

公开(公告)号：US20210081723A1

公开(公告)日：2021-03-18

申请号：US17035546

申请日：2020-09-28

Applicant: DeepMind Technologies Limited

Inventor： Gregory Duncan Wayne , Timothy Paul Lillicrap , Chia-Chun Hung , Joshua Simon Abramson

IPC: G06K9/62 , G06F11/30 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment to perform a specified task. One of the methods includes causing the agent to perform a task episode in which the agent attempts to perform the specified task; for each of one or more particular time steps in the sequence: generating a modified reward for the particular time step from (i) the actual reward at the time step and (ii) value predictions at one or more time steps that are more than a threshold number of time steps after the particular time step in the sequence; and training, through reinforcement learning, the neural network system using at least the modified rewards for the particular time steps.

14.

发明申请
TRAINING AN UNSUPERVISED MEMORY-BASED PREDICTION SYSTEM TO LEARN COMPRESSED REPRESENTATIONS OF AN ENVIRONMENT 有权

公开(公告)号：US20210034969A1

公开(公告)日：2021-02-04

申请号：US16766945

申请日：2019-03-11

Applicant: DeepMind Technologies Limited

Inventor： Gregory Duncan Wayne , Chia-Chun Hung , David Antony Amos , Mehdi Mirza Mohammadi , Arun Ahuja , Timothy Paul Lillicrap

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a memory-based prediction system configured to receive an input observation characterizing a state of an environment interacted with by an agent and to process the input observation and data read from a memory to update data stored in the memory and to generate a latent representation of the state of the environment. The method comprises: for each of a plurality of time steps: processing an observation for the time step and data read from the memory to: (i) update the data stored in the memory, and (ii) generate a latent representation of the current state of the environment as of the time step; and generating a predicted return that will be received by the agent as a result of interactions with the environment after the observation for the time step is received.

15.

发明公开
IMITATION LEARNING BASED ON PREDICTION OF OUTCOMES 审中-公开

公开(公告)号：US20240185082A1

公开(公告)日：2024-06-06

申请号：US18275722

申请日：2022-02-04

Applicant: DeepMind Technologies Limited

Inventor： Andrew Coulter Jaegle , Yury Sulsky , Gregory Duncan Wayne , Robert David Fergus

IPC: G06N3/092

CPC classification number: G06N3/092

Abstract: A method is proposed of training a policy model to generate action data for controlling an agent to perform a task in an environment. The method comprises: obtaining, for each of a plurality of performances of the task, a corresponding demonstrator trajectory comprising a plurality of sets of state data characterizing the environment at each of a plurality of corresponding successive time steps during the performance of the task; using the demonstrator trajectories to generate a demonstrator model, the demonstrator model being operative to generate, for any said demonstrator trajectory, a value indicative of the probability of the demonstrator trajectory occurring; and jointly training an imitator model and a policy model. The joint training is performed by: generating a plurality of imitation trajectories, each imitation trajectory being generated by repeatedly receiving state data indicating a state of the environment, using the policy model to generate action data indicative of an action, and causing the action to be performed by the agent; training the imitator model using the imitation trajectories, the imitator model being operative to generate, for any said imitation trajectory, a value indicative of the probability of the imitation trajectory occurring; and training the policy model using a reward function which is a measure of the similarity of the demonstrator model and the imitator model.

16.

发明授权
Controlling agents over long time scales using temporal value transport 有权

公开(公告)号：US11769049B2

公开(公告)日：2023-09-26

申请号：US17035546

申请日：2020-09-28

Applicant: DeepMind Technologies Limited

Inventor： Gregory Duncan Wayne , Timothy Paul Lillicrap , Chia-Chun Hung , Joshua Simon Abramson

IPC: G06K9/62 , G06F11/30 , G06N3/08 , G06F18/21 , G06V10/764 , G06V10/774 , G06V10/778 , G06V10/82

CPC classification number: G06N3/08 , G06F11/3037 , G06F11/3072 , G06F18/2193 , G06V10/764 , G06V10/774 , G06V10/7796 , G06V10/82

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment to perform a specified task. One of the methods includes causing the agent to perform a task episode in which the agent attempts to perform the specified task; for each of one or more particular time steps in the sequence: generating a modified reward for the particular time step from (i) the actual reward at the time step and (ii) value predictions at one or more time steps that are more than a threshold number of time steps after the particular time step in the sequence; and training, through reinforcement learning, the neural network system using at least the modified rewards for the particular time steps.

17.

发明公开
CONTROLLING INTERACTIVE AGENTS USING MULTI-MODAL INPUTS 审中-公开

公开(公告)号：US20230178076A1

公开(公告)日：2023-06-08

申请号：US18077194

申请日：2022-12-07

Applicant: DeepMind Technologies Limited

Inventor： Joshua Simon Abramson , Arun Ahuja , Federico Javier Carnevale , Petko Ivanov Georgiev , Chia-Chun Hung , Timothy Paul Lillicrap , Alistair Michael Muldal , Adam Anthony Santoro , Tamara Louise von Glehn , Jessica Paige Landon , Gregory Duncan Wayne , Chen Yan , Rui Zhu

IPC: G10L15/22 , G10L15/16 , G10L13/02 , G06V10/82 , G06V20/50 , G06F40/284 , G06F40/40 , G06V10/774 , G10L15/06

CPC classification number: G10L15/22 , G10L15/16 , G10L13/02 , G06V10/82 , G06V20/50 , G06F40/284 , G06F40/40 , G06V10/774 , G10L15/063 , G10L2015/223

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents. In particular, an interactive agent can be controlled based on multi-modal inputs that include both an observation image and a natural language text sequence.

18.

发明授权
Selecting reinforcement learning actions using a low-level controller 有权

公开(公告)号：US11210585B1

公开(公告)日：2021-12-28

申请号：US15594228

申请日：2017-05-12

Applicant: DeepMind Technologies Limited

Inventor： Nicolas Manfred Otto Heess , Timothy Paul Lillicrap , Gregory Duncan Wayne , Yuval Tassa

IPC: G06N3/08 , G06N3/00

Abstract: Methods, systems, and apparatus for selecting actions to be performed by an agent interacting with an environment. One system includes a high-level controller neural network, low-level controller network, and subsystem. The high-level controller neural network receives an input observation and processes the input observation to generate a high-level output defining a control signal for the low-level controller. The low-level controller neural network receives a designated component of an input observation and processes the designated component and an input control signal to generate a low-level output that defines an action to be performed by the agent in response to the input observation. The subsystem receives a current observation characterizing a current state of the environment, determines whether criteria are satisfied for generating a new control signal, and based on the determination, provides appropriate inputs to the high-level and low-level controllers for selecting an action to be performed by the agent.

19.

发明授权
Augmenting neural networks with external memory 有权

公开(公告)号：US11210579B2

公开(公告)日：2021-12-28

申请号：US16831566

申请日：2020-03-26

Applicant: DeepMind Technologies Limited

Inventor： Alexander Benjamin Graves , Ivo Danihelka , Gregory Duncan Wayne

IPC: G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for augmenting neural networks with an external memory. One of the methods includes providing an output derived from a first portion of a neural network output as a system output; determining one or more sets of writing weights for each of a plurality of locations in an external memory; writing data defined by a third portion of the neural network output to the external memory in accordance with the sets of writing weights; determining one or more sets of reading weights for each of the plurality of locations in the external memory from a fourth portion of the neural network output; reading data from the external memory in accordance with the sets of reading weights; and combining the data read from the external memory with a next system input to generate the next neural network input.

20.

发明授权
Augmenting neural networks with sparsely-accessed external memory 有权

公开(公告)号：US11151443B2

公开(公告)日：2021-10-19

申请号：US15424685

申请日：2017-02-03

Applicant: DeepMind Technologies Limited

Inventor： Ivo Danihelka , Gregory Duncan Wayne , Fu-min Wang , Edward Thomas Grefenstette , Jack William Rae , Alexander Benjamin Graves , Timothy Paul Lillicrap , Timothy James Alexander Harley , Jonathan James Hunt

IPC: G06N3/063 , G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for augmenting neural networks with an external memory. One of the systems includes a sparse memory access subsystem that is configured to perform operations comprising generating a sparse set of reading weights that includes a respective reading weight for each of the plurality of locations in the external memory using the read key, reading data from the plurality of locations in the external memory in accordance with the sparse set of reading weights, generating a set of writing weights that includes a respective writing weight for each of the plurality of locations in the external memory, and writing the write vector to the plurality of locations in the external memory in accordance with the writing weights.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification