REINFORCEMENT LEARNING WITH ADAPTIVE RETURN COMPUTATION SCHEMES

    公开(公告)号:US20230059004A1

    公开(公告)日:2023-02-23

    申请号:US17797878

    申请日:2021-02-08

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning with adaptive return computation schemes. In one aspect, a method includes: maintaining data specifying a policy for selecting between multiple different return computation schemes, each return computation scheme assigning a different importance to exploring the environment while performing an episode of a task; selecting, using the policy, a return computation scheme from the multiple different return computation schemes; controlling an agent to perform the episode of the task to maximize a return computed according to the selected return computation scheme; identifying rewards that were generated as a result of the agent performing the episode of the task; and updating, using the identified rewards, the policy for selecting between multiple different return computation schemes.

Patent Agency Ranking