TRAINING AN ACTION SELECTION SYSTEM USING RELATIVE ENTROPY Q-LEARNING

    公开(公告)号:US20230214649A1

    公开(公告)日:2023-07-06

    申请号:US18008838

    申请日:2021-07-27

    CPC classification number: G06N3/08

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection system using reinforcement learning techniques. In one aspect, a method comprises at each of multiple iterations: obtaining a batch of experience, each experience tuple comprising: a first observation, an action, a second observation, and a reward; for each experience tuple, determining a state value for the second observation, comprising: processing the first observation using a policy neural network to generate an action score for each action in a set of possible actions; sampling multiple actions from the set of possible actions in accordance with the action scores; processing the second observation using a Q neural network to generate a Q value for each sampled action; and determining the state value for the second observation; and determining an update to current values of the Q neural network parameters using the state values.

    DOMAIN ADAPTATION FOR ROBOTIC CONTROL USING SELF-SUPERVISED LEARNING

    公开(公告)号:US20210103815A1

    公开(公告)日:2021-04-08

    申请号:US17065489

    申请日:2020-10-07

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a policy neural network for use in controlling a real-world agent in a real-world environment. One of the methods includes training the policy neural network by optimizing a first task-specific objective that measures a performance of the policy neural network in controlling a simulated version of the real-world agent; and then training the policy neural network by jointly optimizing (i) a self-supervised objective that measures at least a performance of internal representations generated by the policy neural network on a self-supervised task performed on real-world data and (ii) a second task-specific objective that measures the performance of the policy neural network in controlling the simulated version of the real-world agent.

Patent Agency Ranking