-
公开(公告)号:US11783182B2
公开(公告)日:2023-10-10
申请号:US17170316
申请日:2021-02-08
Applicant: DeepMind Technologies Limited
Inventor: Volodymyr Mnih , Adrià Puigdomènech Badia , Alexander Benjamin Graves , Timothy James Alexander Harley , David Silver , Koray Kavukcuoglu
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.
-
公开(公告)号:US20230316729A1
公开(公告)日:2023-10-05
申请号:US17711951
申请日:2022-04-01
Applicant: DeepMind Technologies Limited
Inventor: Dan-Andrei Calian , Sven Adrian Gowal , Timothy Arthur Mann , András György
IPC: G06V10/774 , G06V10/82 , G06V10/776
CPC classification number: G06V10/7747 , G06V10/82 , G06V10/776
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for processing a network input using a trained neural network with network parameters to generate an output for a machine learning task. The training includes: receiving a set of training examples each including a training network input and a reference output; for each training iteration, generating a corrupted network input for each training network input using a corruption neural network; updating perturbation parameters of the corruption neural network using a first objective function based on the corrupted network inputs; generating an updated corrupted network input for each training network input based on the updated perturbation parameters; and generating a network output for each updated corrupted network input using the neural network; for each training example, updating the network parameters using a second objective function based on the network output and the reference output.
-
公开(公告)号:US20230256593A1
公开(公告)日:2023-08-17
申请号:US18018421
申请日:2021-07-27
Applicant: DeepMind Technologies Limited
Inventor: Konrad Zolna , Scott Ellison Reed
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for off-line learning using a reward prediction model. One of the methods includes obtaining robot experience data; training, on a first subset of the robot experience data, a reward prediction model that receives a reward input comprising an input observation and generates as output a reward prediction that is a prediction Neural Network of a task-specific reward for the particular task that should be assigned to the input observation; processing experiences in the robot experience data using the trained reward prediction model to generate a respective reward prediction for each of the processed experiences; and training a policy neural network on (i) the processed experiences and (ii) the respective reward predictions for the processed experiences.
-
公开(公告)号:US11720796B2
公开(公告)日:2023-08-08
申请号:US16856527
申请日:2020-04-23
Applicant: DeepMind Technologies Limited
Abstract: A method includes maintaining respective episodic memory data for each of multiple actions; receiving a current observation characterizing a current state of an environment being interacted with by an agent; processing the current observation using an embedding neural network in accordance with current values of parameters of the embedding neural network to generate a current key embedding for the current observation; for each action of the plurality of actions: determining the p nearest key embeddings in the episodic memory data for the action to the current key embedding according to a distance measure, and determining a Q value for the action from the return estimates mapped to by the p nearest key embeddings in the episodic memory data for the action; and selecting, using the Q values for the actions, an action from the multiple actions as the action to be performed by the agent.
-
公开(公告)号:US11720781B2
公开(公告)日:2023-08-08
申请号:US16756363
申请日:2017-10-20
Applicant: DeepMind Technologies Limited
Inventor: Erich Konrad Elsen
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for interleaving matrix operations of a gated activation unit. One of the methods includes receiving a plurality of weight matrices of a gated activation unit of the neural network, the gated activation unit having two or more layers, each layer defining operations comprising: (i) a matrix operation between a weight matrix for the layer and concatenated input vectors and (ii) a nonlinear activation operation using a result of the matrix operation. Rows of the plurality of weight matrices are interleaved by assigning groups of corresponding rows to respective thread blocks, each thread block being a computation unit for execution by an independent processing unit of a plurality of independent processing units of a parallel processing device.
-
公开(公告)号:US20230244936A1
公开(公告)日:2023-08-03
申请号:US18131567
申请日:2023-04-06
Applicant: DeepMind Technologies Limited
Inventor: David Silver , Oriol Vinyals , Maxwell Elliot Jaderberg
IPC: G06N3/08 , H04L9/40 , G06F18/214
CPC classification number: G06N3/08 , H04L63/205 , G06F18/214
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network having a plurality of policy parameters and used to select actions to be performed by an agent to control the agent to perform a particular task while interacting with one or more other agents in an environment. In one aspect, the method includes: maintaining data specifying a pool of candidate action selection policies; maintaining data specifying respective matchmaking policy; and training the policy neural network using a reinforcement learning technique to update the policy parameters. The policy parameters define policies to be used in controlling the agent to perform the particular task.
-
公开(公告)号:US20230196108A1
公开(公告)日:2023-06-22
申请号:US18169803
申请日:2023-02-15
Applicant: DeepMind Technologies Limited
Inventor: Georg Ostrovski , William Clinton Dabney
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method comprises: receiving a current observation; for each action of a plurality of actions: randomly sampling one or more probability values; for each probability value: processing the action, the current observation, and the probability value using a quantile function network to generate an estimated quantile value for the probability value with respect to a probability distribution over possible returns that would result from the agent performing the action in response to the current observation; determining a measure of central tendency of the one or more estimated quantile values; and selecting an action to be performed by the agent in response to the current observation using the measures of central tendency for the actions.
-
公开(公告)号:US11662210B2
公开(公告)日:2023-05-30
申请号:US17747144
申请日:2022-05-18
Applicant: DeepMind Technologies Limited
Inventor: Andrea Banino , Sudarshan Kumaran , Raia Thais Hadsell , Benigno Uria-Martínez
CPC classification number: G01C21/20 , G06N3/04 , G06N3/0445 , G06N3/0454 , G06N3/08 , G06N3/082 , G06T7/73 , G06N3/006 , G06T2207/20081 , G06T2207/20084
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a grid cell neural network and an action selection neural network. The grid cell network is configured to: receive an input comprising data characterizing a velocity of the agent; process the input to generate a grid cell representation; and process the grid cell representation to generate an estimate of a position of the agent in the environment; the action selection neural network is configured to: receive an input comprising a grid cell representation and an observation characterizing a state of the environment; and process the input to generate an action selection network output.
-
公开(公告)号:US11651208B2
公开(公告)日:2023-05-16
申请号:US16615042
申请日:2018-05-22
Applicant: DEEPMIND TECHNOLOGIES LIMITED
Inventor: Zhongwen Xu , Hado Phillip van Hasselt , Joseph Varughese Modayil , Andre da Motta Salles Barreto , David Silver
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning. A reinforcement learning neural network selects actions to be performed by an agent interacting with an environment to perform a task in an attempt to achieve a specified result. The reinforcement learning neural network has at least one input to receive an input observation characterizing a state of the environment and at least one output for determining an action to be performed by the agent in response to the input observation. The system includes a reward function network coupled to the reinforcement learning neural network. The reward function network has an input to receive reward data characterizing a reward provided by one or more states of the environment and is configured to determine a reward function to provide one or more target values for training the reinforcement learning neural network.
-
50.
公开(公告)号:US20230145129A1
公开(公告)日:2023-05-11
申请号:US18095925
申请日:2023-01-11
Applicant: DeepMind Technologies Limited
Inventor: Andrew Coulter Jaegle , Joao Carreira
IPC: G06N3/092
CPC classification number: G06N3/092
Abstract: This specification describes a method for using a neural network to generate a network output that characterizes an entity. The method includes: obtaining a representation of the entity as a set of data element embeddings, obtaining a set of latent embeddings, and processing: (i) the set of data element embeddings, and (ii) the set of latent embeddings, using the neural network to generate the network output characterizing the entity. The neural network includes: (i) one or more cross-attention blocks, (ii) one or more self-attention blocks, and (iii) an output block. Each cross-attention block updates each latent embedding using attention over some or all of the data element embeddings. Each self-attention block updates each latent embedding using attention over the set of latent embeddings. The output block processes one or more latent embeddings to generate the network output that characterizes the entity.
-
-
-
-
-
-
-
-
-