Deep reinforcement learning for robotic manipulation

    公开(公告)号:US12240113B2

    公开(公告)日:2025-03-04

    申请号:US18526443

    申请日:2023-12-01

    Applicant: GOOGLE LLC

    Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

    DEEP REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATION

    公开(公告)号:US20220388159A1

    公开(公告)日:2022-12-08

    申请号:US17878186

    申请日:2022-08-01

    Applicant: Google LLC

    Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

    DEEP REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATION

    公开(公告)号:US20250153352A1

    公开(公告)日:2025-05-15

    申请号:US19025551

    申请日:2025-01-16

    Applicant: GOOGLE LLC

    Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

    DEEP REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATION

    公开(公告)号:US20210237266A1

    公开(公告)日:2021-08-05

    申请号:US17052679

    申请日:2019-06-14

    Applicant: Google LLC

    Abstract: Using large-scale reinforcement learning to train a policy model that can be utilized by a robot in performing a robotic task in which the robot interacts with one or more environmental objects. In various implementations, off-policy deep reinforcement learning is used to train the policy model, and the off-policy deep reinforcement learning is based on self-supervised data collection. The policy model can be a neural network model. Implementations of the reinforcement learning utilized in training the neural network model utilize a continuous-action variant of Q-learning. Through techniques disclosed herein, implementations can learn policies that generalize effectively to previously unseen objects, previously unseen environments, etc.

Patent Agency Ranking