REINFORCEMENT LEARNING BY DIRECTLY LEARNING AN ADVANTAGE FUNCTION

    公开(公告)号:US20240256882A1

    公开(公告)日:2024-08-01

    申请号:US18424520

    申请日:2024-01-26

    CPC classification number: G06N3/092

    Abstract: A system and method, implemented by one or more computers, of controlling an agent to take actions in an environment to perform a task is provided. The method comprises maintaining a value function neural network an advantage function neural network that is an estimate of a state-action advantage function representing a relative advantage of performing one possible action relative to the other possible actions. The method further comprises using the advantage function neural network to control the agent to take actions in the environment to perform the task. The method also comprises training the value function neural network and the advantage function neural network in a way that takes into account a behavior policy defined by a distribution of actions taken by the agent in training data.

Patent Agency Ranking