-
公开(公告)号:US20250051289A1
公开(公告)日:2025-02-13
申请号:US18929321
申请日:2024-10-28
Applicant: DeepMind Technologies Limited
Inventor: Gregory Duncan Wayne , Chia-Chun Hung , David Antony Amos , Mehdi Mirza Mohammadi , Arun Ahuja , Timothy Paul Lillicrap
IPC: C07D239/47 , A61P35/00 , C07C275/36
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a memory-based prediction system configured to receive an input observation characterizing a state of an environment interacted with by an agent and to process the input observation and data read from a memory to update data stored in the memory and to generate a latent representation of the state of the environment. The method comprises: for each of a plurality of time steps: processing an observation for the time step and data read from the memory to: (i) update the data stored in the memory, and (ii) generate a latent representation of the current state of the environment as of the time step; and generating a predicted return that will be received by the agent as a result of interactions with the environment after the observation for the time step is received.
-
公开(公告)号:US20210089968A1
公开(公告)日:2021-03-25
申请号:US17113669
申请日:2020-12-07
Applicant: DeepMind Technologies Limited
IPC: G06N20/00
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating sequences of predicted observations, for example images. In one aspect, a system comprises a controller recurrent neural network, and a decoder neural network to process a set of latent variables to generate an observation. An external memory and a memory interface subsystem is configured to, for each of a plurality of time steps, receive an updated hidden state from the controller, generate a memory context vector by reading data from the external memory using the updated hidden state, determine a set of latent variables from the memory context vector, generate a predicted observation by providing the set of latent variables to the decoder neural network, write data to the external memory using the latent variables, the updated hidden state, or both, and generate a controller input for a subsequent time step from the latent variables.
-
公开(公告)号:US20210081723A1
公开(公告)日:2021-03-18
申请号:US17035546
申请日:2020-09-28
Applicant: DeepMind Technologies Limited
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment to perform a specified task. One of the methods includes causing the agent to perform a task episode in which the agent attempts to perform the specified task; for each of one or more particular time steps in the sequence: generating a modified reward for the particular time step from (i) the actual reward at the time step and (ii) value predictions at one or more time steps that are more than a threshold number of time steps after the particular time step in the sequence; and training, through reinforcement learning, the neural network system using at least the modified rewards for the particular time steps.
-
公开(公告)号:US20210034969A1
公开(公告)日:2021-02-04
申请号:US16766945
申请日:2019-03-11
Applicant: DeepMind Technologies Limited
Inventor: Gregory Duncan Wayne , Chia-Chun Hung , David Antony Amos , Mehdi Mirza Mohammadi , Arun Ahuja , Timothy Paul Lillicrap
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a memory-based prediction system configured to receive an input observation characterizing a state of an environment interacted with by an agent and to process the input observation and data read from a memory to update data stored in the memory and to generate a latent representation of the state of the environment. The method comprises: for each of a plurality of time steps: processing an observation for the time step and data read from the memory to: (i) update the data stored in the memory, and (ii) generate a latent representation of the current state of the environment as of the time step; and generating a predicted return that will be received by the agent as a result of interactions with the environment after the observation for the time step is received.
-
公开(公告)号:US12159221B2
公开(公告)日:2024-12-03
申请号:US16766945
申请日:2019-03-11
Applicant: DeepMind Technologies Limited
Inventor: Gregory Duncan Wayne , Chia-Chun Hung , David Antony Amos , Mehdi Mirza Mohammadi , Arun Ahuja , Timothy Paul Lillicrap
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a memory-based prediction system configured to receive an input observation characterizing a state of an environment interacted with by an agent and to process the input observation and data read from a memory to update data stored in the memory and to generate a latent representation of the state of the environment. The method comprises: for each of a plurality of time steps: processing an observation for the time step and data read from the memory to: (i) update the data stored in the memory, and (ii) generate a latent representation of the current state of the environment as of the time step; and generating a predicted return that will be received by the agent as a result of interactions with the environment after the observation for the time step is received.
-
公开(公告)号:US10789511B2
公开(公告)日:2020-09-29
申请号:US16601324
申请日:2019-10-14
Applicant: DeepMind Technologies Limited
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment to perform a specified task. One of the methods includes causing the agent to perform a task episode in which the agent attempts to perform the specified task; for each of one or more particular time steps in the sequence: generating a modified reward for the particular time step from (i) the actual reward at the time step and (ii) value predictions at one or more time steps that are more than a threshold number of time steps after the particular time step in the sequence; and training, through reinforcement learning, the neural network system using at least the modified rewards for the particular time steps.
-
公开(公告)号:US11977967B2
公开(公告)日:2024-05-07
申请号:US17113669
申请日:2020-12-07
Applicant: DeepMind Technologies Limited
IPC: G06N3/06 , G06N3/0455 , G06N3/049 , G06N20/00 , G06F16/908 , G06N3/084
CPC classification number: G06N3/06 , G06N3/0455 , G06N3/049 , G06N20/00 , G05B2219/33025 , G06F16/908 , G06N3/084
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating sequences of predicted observations, for example images. In one aspect, a system comprises a controller recurrent neural network, and a decoder neural network to process a set of latent variables to generate an observation. An external memory and a memory interface subsystem is configured to, for each of a plurality of time steps, receive an updated hidden state from the controller, generate a memory context vector by reading data from the external memory using the updated hidden state, determine a set of latent variables from the memory context vector, generate a predicted observation by providing the set of latent variables to the decoder neural network, write data to the external memory using the latent variables, the updated hidden state, or both, and generate a controller input for a subsequent time step from the latent variables.
-
公开(公告)号:US11769049B2
公开(公告)日:2023-09-26
申请号:US17035546
申请日:2020-09-28
Applicant: DeepMind Technologies Limited
IPC: G06K9/62 , G06F11/30 , G06N3/08 , G06F18/21 , G06V10/764 , G06V10/774 , G06V10/778 , G06V10/82
CPC classification number: G06N3/08 , G06F11/3037 , G06F11/3072 , G06F18/2193 , G06V10/764 , G06V10/774 , G06V10/7796 , G06V10/82
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment to perform a specified task. One of the methods includes causing the agent to perform a task episode in which the agent attempts to perform the specified task; for each of one or more particular time steps in the sequence: generating a modified reward for the particular time step from (i) the actual reward at the time step and (ii) value predictions at one or more time steps that are more than a threshold number of time steps after the particular time step in the sequence; and training, through reinforcement learning, the neural network system using at least the modified rewards for the particular time steps.
-
公开(公告)号:US20230178076A1
公开(公告)日:2023-06-08
申请号:US18077194
申请日:2022-12-07
Applicant: DeepMind Technologies Limited
Inventor: Joshua Simon Abramson , Arun Ahuja , Federico Javier Carnevale , Petko Ivanov Georgiev , Chia-Chun Hung , Timothy Paul Lillicrap , Alistair Michael Muldal , Adam Anthony Santoro , Tamara Louise von Glehn , Jessica Paige Landon , Gregory Duncan Wayne , Chen Yan , Rui Zhu
IPC: G10L15/22 , G10L15/16 , G10L13/02 , G06V10/82 , G06V20/50 , G06F40/284 , G06F40/40 , G06V10/774 , G10L15/06
CPC classification number: G10L15/22 , G10L15/16 , G10L13/02 , G06V10/82 , G06V20/50 , G06F40/284 , G06F40/40 , G06V10/774 , G10L15/063 , G10L2015/223
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents. In particular, an interactive agent can be controlled based on multi-modal inputs that include both an observation image and a natural language text sequence.
-
公开(公告)号:US20200117956A1
公开(公告)日:2020-04-16
申请号:US16601324
申请日:2019-10-14
Applicant: DeepMind Technologies Limited
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment to perform a specified task. One of the methods includes causing the agent to perform a task episode in which the agent attempts to perform the specified task; for each of one or more particular time steps in the sequence: generating a modified reward for the particular time step from (i) the actual reward at the time step and (ii) value predictions at one or more time steps that are more than a threshold number of time steps after the particular time step in the sequence; and training, through reinforcement learning, the neural network system using at least the modified rewards for the particular time steps.
-
-
-
-
-
-
-
-
-