CONTROLLING AGENTS USING RELATIVE VARIATIONAL INTRINSIC CONTROL

    公开(公告)号:US20230325635A1

    公开(公告)日:2023-10-12

    申请号:US18025304

    申请日:2021-09-10

    CPC classification number: G06N3/045 G06N3/08

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network for use in controlling an agent using relative variational intrinsic control. In one aspect, a method includes: selecting a skill from a set of skills; generating a trajectory by controlling the agent using the policy neural network while the policy neural network is conditioned on the selected skill; processing an initial observation and a last observation using a relative discriminator neural network to generate a relative score; processing the last observation using an absolute discriminator neural network to generate an absolute score; generating a reward for the trajectory from the absolute score corresponding to the selected skill and the relative score corresponding to the selected skill; and training the policy neural network on the reward for the trajectory.

    REINFORCEMENT LEARNING USING AN ENSEMBLE OF DISCRIMINATOR MODELS

    公开(公告)号:US20240311639A1

    公开(公告)日:2024-09-19

    申请号:US18281711

    申请日:2022-05-27

    CPC classification number: G06N3/092 G06N3/045

    Abstract: This specification describes a method performed by one or more data processing apparatus that includes: sampling a latent from a set of possible latents, selecting actions to be performed by an agent to interact with an environment over a sequence of time steps using an action selection neural network that is conditioned on the sampled latent, determining a respective reward received for each time step in the sequence of time steps using an ensemble of discriminator models, and training the action selection neural network based on the rewards using a reinforcement learning technique. Each discriminator model can process an observation to generate a respective prediction output that predicts which latent the action selection neural network was conditioned on to cause the environment to enter the state characterized by the observation.

    LEARNING DIVERSE SKILLS FOR TASKS USING SEQUENTIAL LATENT VARIABLES FOR ENVIRONMENT DYNAMICS

    公开(公告)号:US20240185083A1

    公开(公告)日:2024-06-06

    申请号:US18285519

    申请日:2022-05-27

    CPC classification number: G06N3/092

    Abstract: This specification relates to methods for controlling agents to perform actions according to a goal (or option) comprising a sequence of local goals (or local options) and corresponding methods for training. As discussed herein, environment dynamics may be modelled sequentially by sampling latent variables, each latent variable relating to a local goal and being dependent on a previous latent variable. These latent variables are used to condition an action-selection policy neural network to select actions according to the local goal. This allows the agents to reach more diverse states than would be possible through a fixed latent variable or goal, thereby encouraging exploratory behavior. In addition, specific methods described herein model the sequence of latent variables through a simple linear and recurrent relationship that allows the system to be trained more efficiently. This avoids the need to learn a state-dependent higher level policy for selecting the latent variables which can be difficult to train in practice.

Patent Agency Ranking