-
公开(公告)号:US20190354867A1
公开(公告)日:2019-11-21
申请号:US16417522
申请日:2019-05-20
Applicant: DeepMind Technologies Limited
Inventor: Wojciech Czarnecki , Siddhant Jayakumar
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning using agent curricula. One of the methods includes maintaining data specifying plurality of candidate agent policy neural networks; initializing mixing data that assigns a respective weight to each of the candidate agent policy neural networks; training the candidate agent policy neural networks using a reinforcement learning technique to generate combined action selection policies that result in improved performance on a reinforcement learning task; and during the training, repeatedly adjusting the weights in the mixing data to favor higher-performing candidate agent policy neural networks.
-
2.
公开(公告)号:US12242947B2
公开(公告)日:2025-03-04
申请号:US16759561
申请日:2018-10-29
Applicant: DeepMind Technologies Limited
Inventor: Pablo Sprechmann , Siddhant Jayakumar , Jack William Rae , Alexander Pritzel , Adrià Puigdomènech Badia , Oriol Vinyals , Razvan Pascanu , Charles Blundell
Abstract: There is described herein a computer-implemented method of processing an input data item. The method comprises processing the input data item using a parametric model to generate output data, wherein the parametric model comprises a first sub-model and a second sub-model. The processing comprises processing, by the first sub-model, the input data to generate a query data item, retrieving, from a memory storing data point-value pairs, at least one data point-value pair based upon the query data item and modifying weights of the second sub-model based upon the retrieved at least one data point-value pair. The output data is then generated based upon the modified second sub-model.
-
公开(公告)号:US11113605B2
公开(公告)日:2021-09-07
申请号:US16417522
申请日:2019-05-20
Applicant: DeepMind Technologies Limited
Inventor: Wojciech Czarnecki , Siddhant Jayakumar
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning using agent curricula. One of the methods includes maintaining data specifying plurality of candidate agent policy neural networks; initializing mixing data that assigns a respective weight to each of the candidate agent policy neural networks; training the candidate agent policy neural networks using a reinforcement learning technique to generate combined action selection policies that result in improved performance on a reinforcement learning task; and during the training, repeatedly adjusting the weights in the mixing data to favor higher-performing candidate agent policy neural networks.
-
公开(公告)号:US11423300B1
公开(公告)日:2022-08-23
申请号:US16271533
申请日:2019-02-08
Applicant: DeepMind Technologies Limited
Inventor: Samuel Ritter , Xiao Jing Wang , Siddhant Jayakumar , Razvan Pascanu , Charles Blundell , Matthew Botvinick
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a system output using a remembered value of a neural network hidden state. In one aspect, a system comprises an external memory that maintains context experience tuples respectively comprising: (i) a key embedding of context data, and (ii) a value of a hidden state of a neural network at the respective previous time step. The neural network is configured to receive a system input and a remembered value of the hidden state of the neural network and to generate a system output. The system comprises a memory interface subsystem that is configured to determine a key embedding for current context data, determine a remembered value of the hidden state of the neural network based on the key embedding, and provide the remembered value of the hidden state as an input to the neural network.
-
5.
公开(公告)号:US20200285940A1
公开(公告)日:2020-09-10
申请号:US16759561
申请日:2018-10-29
Applicant: DeepMind Technologies Limited
Inventor: Pablo Sprechmann , Siddhant Jayakumar , Jack William Rae , Alexander Pritzel , Adrià Puigdomènech Badia , Oriol Vinyals , Razvan Pascanu , Charles Blundell
Abstract: There is described herein a computer-implemented method of processing an input data item. The method comprises processing the input data item using a parametric model to generate output data, wherein the parametric model comprises a first sub-model and a second sub-model. The processing comprises processing, by the first sub-model, the input data to generate a query data item, retrieving, from a memory storing data point-value pairs, at least one data point-value pair based upon the query data item and modifying weights of the second sub-model based upon the retrieved at least one data point-value pair. The output data is then generated based upon the modified second sub-model.
-
-
-
-