REINFORCEMENT LEARNING USING AN ENSEMBLE OF DISCRIMINATOR MODELS

    公开(公告)号:US20240311639A1

    公开(公告)日:2024-09-19

    申请号:US18281711

    申请日:2022-05-27

    CPC classification number: G06N3/092 G06N3/045

    Abstract: This specification describes a method performed by one or more data processing apparatus that includes: sampling a latent from a set of possible latents, selecting actions to be performed by an agent to interact with an environment over a sequence of time steps using an action selection neural network that is conditioned on the sampled latent, determining a respective reward received for each time step in the sequence of time steps using an ensemble of discriminator models, and training the action selection neural network based on the rewards using a reinforcement learning technique. Each discriminator model can process an observation to generate a respective prediction output that predicts which latent the action selection neural network was conditioned on to cause the environment to enter the state characterized by the observation.

    Neural network architecture for efficient resource allocation

    公开(公告)号:US11250475B2

    公开(公告)日:2022-02-15

    申请号:US16918805

    申请日:2020-07-01

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for efficiently allocating resources among participants. Methods can include receiving valuation data specifying, for each of a plurality of entities, a respective valuation for each of a plurality of resource subsets, each resource subset comprising a different combination of one or more resources of a plurality of resources. After receiving valuation data, assigning each resource in the plurality of resources to a respective entity of the plurality of entities based on the valuations and generating, for each particular entity, a respective input representation that is derived from valuations of every other entity in the plurality of entities other than the particular entity. The input representation for each particular entity is processed using a neural network to generate a rule for the particular entity and a payment based on the rule output for the entities.

    NEURAL NETWORK ARCHITECTURE FOR EFFICIENT RESOURCE ALLOCATION

    公开(公告)号:US20220005079A1

    公开(公告)日:2022-01-06

    申请号:US16918805

    申请日:2020-07-01

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for efficiently allocating resources among participants. Methods can include receiving valuation data specifying, for each of a plurality of entities, a respective valuation for each of a plurality of resource subsets, each resource subset comprising a different combination of one or more resources of a plurality of resources. After receiving valuation data, assigning each resource in the plurality of resources to a respective entity of the plurality of entities based on the valuations and generating, for each particular entity, a respective input representation that is derived from valuations of every other entity in the plurality of entities other than the particular entity. The input representation for each particular entity is processed using a neural network to generate a rule for the particular entity and a payment based on the rule output for the entities.

Patent Agency Ranking