-
公开(公告)号:US20210065012A1
公开(公告)日:2021-03-04
申请号:US17020248
申请日:2020-09-14
Applicant: DeepMind Technologies Limited
Inventor: Mohammad Gheshlaghi Azar , Meire Fortunato , Bilal Piot , Olivier Claude Pietquin , Jacob Lee Menick , Volodymyr Mnih , Charles Blundell , Remi Munos
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent. The method includes obtaining an observation characterizing a current state of an environment. For each layer parameter of each noisy layer of a neural network, a respective noise value is determined. For each layer parameter of each noisy layer, a noisy current value for the layer parameter is determined from a current value of the layer parameter, a current value of a corresponding noise parameter, and the noise value. A network input including the observation is processed using the neural network in accordance with the noisy current values to generate a network output for the network input. An action is selected from a set of possible actions to be performed by the agent in response to the observation using the network output.
-
公开(公告)号:US20190332938A1
公开(公告)日:2019-10-31
申请号:US16508042
申请日:2019-07-10
Applicant: DeepMind Technologies Limited
Inventor: Marc Gendron-Bellemare , Jacob Lee Menick , Alexander Benjamin Graves , Koray Kavukcuoglu , Remi Munos
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model. In one aspect, a method includes receiving training data for training the machine learning model on a plurality of tasks, where each task includes multiple batches of training data. A task is selected in accordance with a current task selection policy. A batch of training data is selected from the selected task. The machine learning model is trained on the selected batch of training data to determine updated values of the model parameters. A learning progress measure that represents a progress of the training of the machine learning model as a result of training the machine learning model on the selected batch of training data is determined. The current task selection policy is updated using the learning progress measure.
-