-
公开(公告)号:US10789511B2
公开(公告)日:2020-09-29
申请号:US16601324
申请日:2019-10-14
Applicant: DeepMind Technologies Limited
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment to perform a specified task. One of the methods includes causing the agent to perform a task episode in which the agent attempts to perform the specified task; for each of one or more particular time steps in the sequence: generating a modified reward for the particular time step from (i) the actual reward at the time step and (ii) value predictions at one or more time steps that are more than a threshold number of time steps after the particular time step in the sequence; and training, through reinforcement learning, the neural network system using at least the modified rewards for the particular time steps.
-
公开(公告)号:US20210081723A1
公开(公告)日:2021-03-18
申请号:US17035546
申请日:2020-09-28
Applicant: DeepMind Technologies Limited
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment to perform a specified task. One of the methods includes causing the agent to perform a task episode in which the agent attempts to perform the specified task; for each of one or more particular time steps in the sequence: generating a modified reward for the particular time step from (i) the actual reward at the time step and (ii) value predictions at one or more time steps that are more than a threshold number of time steps after the particular time step in the sequence; and training, through reinforcement learning, the neural network system using at least the modified rewards for the particular time steps.
-
公开(公告)号:US11769049B2
公开(公告)日:2023-09-26
申请号:US17035546
申请日:2020-09-28
Applicant: DeepMind Technologies Limited
IPC: G06K9/62 , G06F11/30 , G06N3/08 , G06F18/21 , G06V10/764 , G06V10/774 , G06V10/778 , G06V10/82
CPC classification number: G06N3/08 , G06F11/3037 , G06F11/3072 , G06F18/2193 , G06V10/764 , G06V10/774 , G06V10/7796 , G06V10/82
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment to perform a specified task. One of the methods includes causing the agent to perform a task episode in which the agent attempts to perform the specified task; for each of one or more particular time steps in the sequence: generating a modified reward for the particular time step from (i) the actual reward at the time step and (ii) value predictions at one or more time steps that are more than a threshold number of time steps after the particular time step in the sequence; and training, through reinforcement learning, the neural network system using at least the modified rewards for the particular time steps.
-
公开(公告)号:US20230178076A1
公开(公告)日:2023-06-08
申请号:US18077194
申请日:2022-12-07
Applicant: DeepMind Technologies Limited
Inventor: Joshua Simon Abramson , Arun Ahuja , Federico Javier Carnevale , Petko Ivanov Georgiev , Chia-Chun Hung , Timothy Paul Lillicrap , Alistair Michael Muldal , Adam Anthony Santoro , Tamara Louise von Glehn , Jessica Paige Landon , Gregory Duncan Wayne , Chen Yan , Rui Zhu
IPC: G10L15/22 , G10L15/16 , G10L13/02 , G06V10/82 , G06V20/50 , G06F40/284 , G06F40/40 , G06V10/774 , G10L15/06
CPC classification number: G10L15/22 , G10L15/16 , G10L13/02 , G06V10/82 , G06V20/50 , G06F40/284 , G06F40/40 , G06V10/774 , G10L15/063 , G10L2015/223
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents. In particular, an interactive agent can be controlled based on multi-modal inputs that include both an observation image and a natural language text sequence.
-
公开(公告)号:US20200117956A1
公开(公告)日:2020-04-16
申请号:US16601324
申请日:2019-10-14
Applicant: DeepMind Technologies Limited
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment to perform a specified task. One of the methods includes causing the agent to perform a task episode in which the agent attempts to perform the specified task; for each of one or more particular time steps in the sequence: generating a modified reward for the particular time step from (i) the actual reward at the time step and (ii) value predictions at one or more time steps that are more than a threshold number of time steps after the particular time step in the sequence; and training, through reinforcement learning, the neural network system using at least the modified rewards for the particular time steps.
-
-
-
-