-
公开(公告)号:US11842264B2
公开(公告)日:2023-12-12
申请号:US16759993
申请日:2018-11-30
Applicant: DeepMind Technologies Limited
Inventor: Agnieszka Grabska-Barwinska , Peter Toth , Christopher Mattern , Avishkar Bhoopchand , Tor Lattimore , Joel William Veness
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for a neural network system comprising one or more gated linear networks. A system includes: one or more gated linear networks, wherein each gated linear network corresponds to a respective data value in an output data sample and is configured to generate a network probability output that defines a probability distribution over possible values for the corresponding data value, wherein each gated linear network comprises a plurality of layers, wherein the plurality of layers comprises a plurality of gated linear layers, wherein each gated linear layer has one or more nodes, and wherein each node is configured to: receive a plurality of inputs, receive side information for the node; combine the plurality of inputs according to a set of weights defined by the side information, and generate and output a node probability output for the corresponding data value.
-
公开(公告)号:US20230079338A1
公开(公告)日:2023-03-16
申请号:US17766854
申请日:2020-10-08
Applicant: DeepMind Technologies Limited
Inventor: Eren Sezener , Joel William Veness , Marcus Hutter , Jianan Wang , David Budden
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for training a neural network to control a real-world agent interacting with a real-world environment to cause the real-world agent to perform a particular task. One of the methods includes training the neural network to determine first values of the parameters by optimizing a first task-specific objective that measures a performance of the policy neural network in controlling a simulated version of the real-world agent; obtaining real-world data generated from interactions of the real-world agent with the real-world environment; and training the neural network to determine trained values of the parameters from the first values of the parameters by jointly optimizing (i) a self-supervised objective that measures at least a performance of internal representations generated by the neural network on a self-supervised task performed on the real-world data and (ii) a second task-specific objective.
-
公开(公告)号:US20200349418A1
公开(公告)日:2020-11-05
申请号:US16759993
申请日:2018-11-30
Applicant: DeepMind Technologies Limited
Inventor: Agnieszka Grabska-Barwinska , Peter Toth , Christopher Mattern , Avishkar Bhoopchand , Tor Lattimore , Joel William Veness
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for a neural network system comprising one or more gated linear networks. A system includes: one or more gated linear networks, wherein each gated linear network corresponds to a respective data value in an output data sample and is configured to generate a network probability output that defines a probability distribution over possible values for the corresponding data value, wherein each gated linear network comprises a plurality of layers, wherein the plurality of layers comprises a plurality of gated linear layers, wherein each gated linear layer has one or more nodes, and wherein each node is configured to: receive a plurality of inputs, receive side information for the node; combine the plurality of inputs according to a set of weights defined by the side information, and generate and output a node probability output for the corresponding data value.
-
公开(公告)号:US10445653B1
公开(公告)日:2019-10-15
申请号:US14821549
申请日:2015-08-07
Applicant: DeepMind Technologies Limited
Inventor: Joel William Veness , Marc Gendron-Bellemare
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for evaluating reinforcement learning policies. One of the methods includes receiving a plurality of training histories for a reinforcement learning agent; determining a total reward for each training observation in the training histories; partitioning the training observations into a plurality of partitions; determining, for each partition and from the partitioned training observations, a probability that the reinforcement learning agent will receive the total reward for the partition if the reinforcement learning agent performs the action for the partition in response to receiving the current observation; determining, from the probabilities and for each total reward, a respective estimated value of performing each action in response to receiving the current observation; and selecting an action from the pre-determined set of actions from the estimated values in accordance with an action selection policy.
-
公开(公告)号:US20240202511A1
公开(公告)日:2024-06-20
申请号:US18536127
申请日:2023-12-11
Applicant: DeepMind Technologies Limited
Inventor: Agnieszka Grabska-Barwinska , Peter Toth , Christopher Mattern , Avishkar Bhoopchand , Tor Lattimore , Joel William Veness
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for a neural network system comprising one or more gated linear networks. A system includes: one or more gated linear networks, wherein each gated linear network corresponds to a respective data value in an output data sample and is configured to generate a network probability output that defines a probability distribution over possible values for the corresponding data value, wherein each gated linear network comprises a plurality of layers, wherein the plurality of layers comprises a plurality of gated linear layers, wherein each gated linear layer has one or more nodes, and wherein each node is configured to: receive a plurality of inputs, receive side information for the node; combine the plurality of inputs according to a set of weights defined by the side information, and generate and output a node probability output for the corresponding data value.
-
公开(公告)号:US11429898B1
公开(公告)日:2022-08-30
申请号:US16601547
申请日:2019-10-14
Applicant: DeepMind Technologies Limited
Inventor: Joel William Veness , Marc Gendron-Bellemare
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for evaluating reinforcement learning policies. One of the methods includes receiving a plurality of training histories for a reinforcement learning agent; determining a total reward for each training observation in the training histories; partitioning the training observations into a plurality of partitions; determining, for each partition and from the partitioned training observations, a probability that the reinforcement learning agent will receive the total reward for the partition if the reinforcement learning agent performs the action for the partition in response to receiving the current observation; determining, from the probabilities and for each total reward, a respective estimated value of performing each action in response to receiving the current observation; and selecting an action from the pre-determined set of actions from the estimated values in accordance with an action selection policy.
-
-
-
-
-