-
公开(公告)号:US20210256072A1
公开(公告)日:2021-08-19
申请号:US17177097
申请日:2021-02-16
Applicant: DeepMind Technologies Limited
Inventor: Timothy Arthur Mann , Ivan Lobov , Anton Zhernov , Krishnamurthy Dvijotham , Xiaohong Gong , Dan-Andrei Calian
IPC: G06F16/903 , G06F17/16 , G06F17/11
Abstract: Methods and systems for low-latency multi-constraint ranking of content items. One of the methods includes receiving a request to rank a plurality of content items for presentation to a user to maximize a primary objective subject to a plurality of constraints; initializing a dual variable vector; updating the dual variable vector, comprising: determining an overall objective score for the dual variable vector; identifying a plurality of candidate dual variable vectors that includes one or more neighboring node dual variable vectors; determining respective overall objective scores for each of the one or more candidate dual variable vectors; identifying the candidate with the best overall objective score; and determining whether to update the dual variable vector based on whether the identified candidate has a better overall objective score than the dual variable vector; and determining a final ranking for the content items based on the dual variable vector.
-
公开(公告)号:US20220343157A1
公开(公告)日:2022-10-27
申请号:US17620164
申请日:2020-06-17
Applicant: DEEPMIND TECHNOLOGIES LIMITED
Inventor: Daniel J. Mankowitz , Nir Levine , Rae Chan Jeong , Abbas Abdolmaleki , Jost Tobias Springenberg , Todd Andrew Hester , Timothy Arthur Mann , Martin Riedmiller
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a policy neural network having policy parameters. One of the methods includes sampling a mini-batch comprising one or more observation-action-reward tuples generated as a result of interactions of a first agent with a first environment; determining an update to current values of the Q network parameters by minimizing a robust entropy-regularized temporal difference (TD) error that accounts for possible perturbations of the states of the first environment represented by the observations in the observation-action-reward tuples; and determining, using the Q-value neural network, an update to the policy network parameters using the sampled mini-batch of observation-action-reward tuples.
-
公开(公告)号:US20210158196A1
公开(公告)日:2021-05-27
申请号:US17103843
申请日:2020-11-24
Applicant: DeepMind Technologies Limited
Inventor: Claire Vernade , András György , Timothy Arthur Mann
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, of selecting actions from a set of actions to be performed in an environment. One of the methods includes, at each time step: maintaining count data; determining, for each action, a respective current transition probability distribution that includes a respective current transition probability for each of the intermediate signals that represents an estimate of a current likelihood that the intermediate signal will be observed if the action is performed; determining, for each intermediate signal, a respective reward estimate that is an estimate of a reward that will be received as a result of the intermediate signal being observed; determining, from the respective current transition probability distributions and the respective reward estimates, a respective action score for each action; and selecting an action to be performed based on the respective action scores.
-
公开(公告)号:US12254678B2
公开(公告)日:2025-03-18
申请号:US17711951
申请日:2022-04-01
Applicant: DeepMind Technologies Limited
Inventor: Dan-Andrei Calian , Sven Adrian Gowal , Timothy Arthur Mann , András György
IPC: G06V10/774 , G06V10/776 , G06V10/82
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for processing a network input using a trained neural network with network parameters to generate an output for a machine learning task. The training includes: receiving a set of training examples each including a training network input and a reference output; for each training iteration, generating a corrupted network input for each training network input using a corruption neural network; updating perturbation parameters of the corruption neural network using a first objective function based on the corrupted network inputs; generating an updated corrupted network input for each training network input based on the updated perturbation parameters; and generating a network output for each updated corrupted network input using the neural network; for each training example, updating the network parameters using a second objective function based on the network output and the reference output.
-
公开(公告)号:US20230244912A1
公开(公告)日:2023-08-03
申请号:US18131580
申请日:2023-04-06
Applicant: DeepMind Technologies Limited
Inventor: Huiyi Hu , Ray Jiang , Timothy Arthur Mann , Sven Adrian Gowal , Balaji Lakshminarayanan , András György
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for learning from delayed outcomes using neural networks. One of the methods includes receiving an input observation; generating, from the input observation, an output label distribution over possible labels for the input observation at a final time, comprising: processing the input observation using a first neural network configured to process the input observation to generate a distribution over possible values for an intermediate indicator at a first time earlier than the final time; generating, from the distribution, an input value for the intermediate indicator; and processing the input value for the intermediate indicator using a second neural network configured to process the input value for the intermediate indicator to determine the output label distribution over possible values for the input observation at the final time; and providing an output derived from the output label distribution.
-
公开(公告)号:US12124938B2
公开(公告)日:2024-10-22
申请号:US18131580
申请日:2023-04-06
Applicant: DeepMind Technologies Limited
Inventor: Huiyi Hu , Ray Jiang , Timothy Arthur Mann , Sven Adrian Gowal , Balaji Lakshminarayanan , András György
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for learning from delayed outcomes using neural networks. One of the methods includes receiving an input observation; generating, from the input observation, an output label distribution over possible labels for the input observation at a final time, comprising: processing the input observation using a first neural network configured to process the input observation to generate a distribution over possible values for an intermediate indicator at a first time earlier than the final time; generating, from the distribution, an input value for the intermediate indicator; and processing the input value for the intermediate indicator using a second neural network configured to process the input value for the intermediate indicator to determine the output label distribution over possible values for the input observation at the final time; and providing an output derived from the output label distribution.
-
公开(公告)号:US11714994B2
公开(公告)日:2023-08-01
申请号:US16298448
申请日:2019-03-11
Applicant: DeepMind Technologies Limited
Inventor: Huiyi Hu , Ray Jiang , Timothy Arthur Mann , Sven Adrian Gowal , Balaji Lakshminarayanan , András György
CPC classification number: G06N3/0454 , G06N3/0472 , G06N3/08
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for learning from delayed outcomes using neural networks. One of the methods includes receiving an input observation; generating, from the input observation, an output label distribution over possible labels for the input observation at a final time, comprising: processing the input observation using a first neural network configured to process the input observation to generate a distribution over possible values for an intermediate indicator at a first time earlier than the final time; generating, from the distribution, an input value for the intermediate indicator; and processing the input value for the intermediate indicator using a second neural network configured to process the input value for the intermediate indicator to determine the output label distribution over possible values for the input observation at the final time; and providing an output derived from the output label distribution.
-
公开(公告)号:US12001484B2
公开(公告)日:2024-06-04
申请号:US17177097
申请日:2021-02-16
Applicant: DeepMind Technologies Limited
Inventor: Timothy Arthur Mann , Ivan Lobov , Anton Zhernov , Krishnamurthy Dvijotham , Xiaohong Gong , Dan-Andrei Calian
IPC: G06F16/95 , G06F16/903 , G06F17/11 , G06F17/16
CPC classification number: G06F16/90335 , G06F17/11 , G06F17/16
Abstract: Methods and systems for low-latency multi-constraint ranking of content items. One of the methods includes receiving a request to rank a plurality of content items for presentation to a user to maximize a primary objective subject to a plurality of constraints; initializing a dual variable vector; updating the dual variable vector, comprising: determining an overall objective score for the dual variable vector; identifying a plurality of candidate dual variable vectors that includes one or more neighboring node dual variable vectors; determining respective overall objective scores for each of the one or more candidate dual variable vectors; identifying the candidate with the best overall objective score; and determining whether to update the dual variable vector based on whether the identified candidate has a better overall objective score than the dual variable vector; and determining a final ranking for the content items based on the dual variable vector.
-
公开(公告)号:US20230316729A1
公开(公告)日:2023-10-05
申请号:US17711951
申请日:2022-04-01
Applicant: DeepMind Technologies Limited
Inventor: Dan-Andrei Calian , Sven Adrian Gowal , Timothy Arthur Mann , András György
IPC: G06V10/774 , G06V10/82 , G06V10/776
CPC classification number: G06V10/7747 , G06V10/82 , G06V10/776
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for processing a network input using a trained neural network with network parameters to generate an output for a machine learning task. The training includes: receiving a set of training examples each including a training network input and a reference output; for each training iteration, generating a corrupted network input for each training network input using a corruption neural network; updating perturbation parameters of the corruption neural network using a first objective function based on the corrupted network inputs; generating an updated corrupted network input for each training network input based on the updated perturbation parameters; and generating a network output for each updated corrupted network input using the neural network; for each training example, updating the network parameters using a second objective function based on the network output and the reference output.
-
公开(公告)号:US20190279076A1
公开(公告)日:2019-09-12
申请号:US16298448
申请日:2019-03-11
Applicant: DeepMind Technologies Limited
Inventor: Huiyi Hu , Ray Jiang , Timothy Arthur Mann , Sven Adrian Gowal , Balaji Lakshminarayanan , Andras Gyorgy
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for learning from delayed outcomes using neural networks. One of the methods includes receiving an input observation; generating, from the input observation, an output label distribution over possible labels for the input observation at a final time, comprising: processing the input observation using a first neural network configured to process the input observation to generate a distribution over possible values for an intermediate indicator at a first time earlier than the final time; generating, from the distribution, an input value for the intermediate indicator; and processing the input value for the intermediate indicator using a second neural network configured to process the input value for the intermediate indicator to determine the output label distribution over possible values for the input observation at the final time; and providing an output derived from the output label distribution.
-
-
-
-
-
-
-
-
-