-
公开(公告)号:US20240311639A1
公开(公告)日:2024-09-19
申请号:US18281711
申请日:2022-05-27
Applicant: DeepMind Technologies Limited
Inventor: Steven Stenberg Hansen , Daniel Joseph Strouse
Abstract: This specification describes a method performed by one or more data processing apparatus that includes: sampling a latent from a set of possible latents, selecting actions to be performed by an agent to interact with an environment over a sequence of time steps using an action selection neural network that is conditioned on the sampled latent, determining a respective reward received for each time step in the sequence of time steps using an ensemble of discriminator models, and training the action selection neural network based on the rewards using a reinforcement learning technique. Each discriminator model can process an observation to generate a respective prediction output that predicts which latent the action selection neural network was conditioned on to cause the environment to enter the state characterized by the observation.
-
公开(公告)号:US11250475B2
公开(公告)日:2022-02-15
申请号:US16918805
申请日:2020-07-01
Applicant: DeepMind Technologies Limited
Inventor: Andrea Tacchetti , Daniel Joseph Strouse , Marta Garnelo Abellanas , Thore Kurt Hartwig Graepel , Yoram Bachrach
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for efficiently allocating resources among participants. Methods can include receiving valuation data specifying, for each of a plurality of entities, a respective valuation for each of a plurality of resource subsets, each resource subset comprising a different combination of one or more resources of a plurality of resources. After receiving valuation data, assigning each resource in the plurality of resources to a respective entity of the plurality of entities based on the valuations and generating, for each particular entity, a respective input representation that is derived from valuations of every other entity in the plurality of entities other than the particular entity. The input representation for each particular entity is processed using a neural network to generate a rule for the particular entity and a payment based on the rule output for the entities.
-
公开(公告)号:US20220005079A1
公开(公告)日:2022-01-06
申请号:US16918805
申请日:2020-07-01
Applicant: DeepMind Technologies Limited
Inventor: Andrea Tacchetti , Daniel Joseph Strouse , Marta Garnelo Abellanas , Thore Kurt Hartwig Graepel , Yoram Bachrach
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for efficiently allocating resources among participants. Methods can include receiving valuation data specifying, for each of a plurality of entities, a respective valuation for each of a plurality of resource subsets, each resource subset comprising a different combination of one or more resources of a plurality of resources. After receiving valuation data, assigning each resource in the plurality of resources to a respective entity of the plurality of entities based on the valuations and generating, for each particular entity, a respective input representation that is derived from valuations of every other entity in the plurality of entities other than the particular entity. The input representation for each particular entity is processed using a neural network to generate a rule for the particular entity and a payment based on the rule output for the entities.
-
-