DATA-EFFICIENT HIERARCHICAL REINFORCEMENT LEARNING

    公开(公告)号:US20210187733A1

    公开(公告)日:2021-06-24

    申请号:US17050546

    申请日:2019-05-17

    Applicant: Google LLC

    Abstract: Training and/or utilizing a hierarchical reinforcement learning (HRL) model for robotic control. The HRL model can include at least a higher-level policy model and a lower-level policy model. Some implementations relate to technique(s) that enable more efficient off-policy training to be utilized in training of the higher-level policy model and/or the lower-level policy model. Some of those implementations utilize off-policy correction, which re-labels higher-level actions of experience data, generated in the past utilizing a previously trained version of the HRL model, with modified higher-level actions. The modified higher-level actions are then utilized to off-policy train the higher-level policy model. This can enable effective off-policy training despite the lower-level policy model being a different version at training time (relative to the version when the experience data was collected).

    DEEP REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATION

    公开(公告)号:US20250153352A1

    公开(公告)日:2025-05-15

    申请号:US19025551

    申请日:2025-01-16

    Applicant: GOOGLE LLC

    Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

    DATA-EFFICIENT HIERARCHICAL REINFORCEMENT LEARNING

    公开(公告)号:US20240308068A1

    公开(公告)日:2024-09-19

    申请号:US18673510

    申请日:2024-05-24

    Applicant: GOOGLE LLC

    CPC classification number: B25J9/163

    Abstract: Training and/or utilizing a hierarchical reinforcement learning (HRL) model for robotic control. The HRL model can include at least a higher-level policy model and a lower-level policy model. Some implementations relate to technique(s) that enable more efficient off-policy training to be utilized in training of the higher-level policy model and/or the lower-level policy model. Some of those implementations utilize off-policy correction, which re-labels higher-level actions of experience data, generated in the past utilizing a previously trained version of the HRL model, with modified higher-level actions. The modified higher-level actions are then utilized to off-policy train the higher-level policy model. This can enable effective off-policy training despite the lower-level policy model being a different version at training time (relative to the version when the experience data was collected).

Patent Agency Ranking