LEARNING ENVIRONMENT REPRESENTATIONS FOR AGENT CONTROL USING PREDICTIONS OF BOOTSTRAPPED LATENTS

    公开(公告)号:US20230083486A1

    公开(公告)日:2023-03-16

    申请号:US17797886

    申请日:2021-02-08

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an environment representation neural network of a reinforcement learning system controls an agent to perform a given task. In one aspect, the method includes: receiving a current observation input and a future observation input; generating, from the future observation input, a future latent representation of the future state of the environment; processing, using the environment representation neural network, to generate a current internal representation of the current state of the environment; generating, from the current internal representation, a predicted future latent representation; evaluating an objective function measuring a difference between the future latent representation and the predicted future latent representation; and determining, based on a determined gradient of the objective function, an update to the current values of the environment representation parameters.

    REINFORCEMENT LEARNING WITH ADAPTIVE RETURN COMPUTATION SCHEMES

    公开(公告)号:US20230059004A1

    公开(公告)日:2023-02-23

    申请号:US17797878

    申请日:2021-02-08

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning with adaptive return computation schemes. In one aspect, a method includes: maintaining data specifying a policy for selecting between multiple different return computation schemes, each return computation scheme assigning a different importance to exploring the environment while performing an episode of a task; selecting, using the policy, a return computation scheme from the multiple different return computation schemes; controlling an agent to perform the episode of the task to maximize a return computed according to the selected return computation scheme; identifying rewards that were generated as a result of the agent performing the episode of the task; and updating, using the identified rewards, the policy for selecting between multiple different return computation schemes.

Patent Agency Ranking