-
公开(公告)号:US11845183B2
公开(公告)日:2023-12-19
申请号:US17878186
申请日:2022-08-01
Applicant: Google LLC
Inventor: Sergey Levine , Ethan Holly , Shixiang Gu , Timothy Lillicrap
CPC classification number: B25J9/161 , B25J9/163 , B25J9/1664 , G05B13/027 , G05B19/042 , G06N3/008 , G06N3/045 , G06N3/08 , G05B2219/32335 , G05B2219/33033 , G05B2219/33034 , G05B2219/39001 , G05B2219/39298 , G05B2219/40499
Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.
-
公开(公告)号:US20240131695A1
公开(公告)日:2024-04-25
申请号:US18526443
申请日:2023-12-01
Applicant: GOOGLE LLC
Inventor: Sergey Levine , Ethan Holly , Shixiang Gu , Timothy Lillicrap
CPC classification number: B25J9/161 , B25J9/163 , B25J9/1664 , G05B13/027 , G05B19/042 , G06N3/008 , G06N3/045 , G06N3/08 , G05B2219/32335 , G05B2219/33033 , G05B2219/33034 , G05B2219/39001 , G05B2219/39298 , G05B2219/40499
Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.
-
公开(公告)号:US12240113B2
公开(公告)日:2025-03-04
申请号:US18526443
申请日:2023-12-01
Applicant: GOOGLE LLC
Inventor: Sergey Levine , Ethan Holly , Shixiang Gu , Timothy Lillicrap
Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.
-
公开(公告)号:US20220388159A1
公开(公告)日:2022-12-08
申请号:US17878186
申请日:2022-08-01
Applicant: Google LLC
Inventor: Sergey Levine , Ethan Holly , Shixiang Gu , Timothy Lillicrap
Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.
-
公开(公告)号:US20190232488A1
公开(公告)日:2019-08-01
申请号:US16333482
申请日:2017-09-14
Applicant: Google LLC
Inventor: Sergey Levine , Ethan Holly , Shixiang Gu , Timothy Lillicrap
IPC: B25J9/16 , G05B13/02 , G05B19/042
CPC classification number: B25J9/161 , B25J9/163 , B25J9/1664 , G05B13/027 , G05B19/042 , G05B2219/32335 , G05B2219/33033 , G05B2219/33034 , G05B2219/39001 , G05B2219/39298 , G05B2219/40499 , G06N3/008 , G06N3/0454 , G06N3/08
Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.
-
公开(公告)号:US20250153352A1
公开(公告)日:2025-05-15
申请号:US19025551
申请日:2025-01-16
Applicant: GOOGLE LLC
Inventor: Sergey Levine , Ethan Holly , Shixiang Gu , Timothy Lillicrap
Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.
-
公开(公告)号:US11897133B2
公开(公告)日:2024-02-13
申请号:US17878186
申请日:2022-08-01
Applicant: Google LLC
Inventor: Sergey Levine , Ethan Holly , Shixiang Gu , Timothy Lillicrap
CPC classification number: B25J9/161 , B25J9/163 , B25J9/1664 , G05B13/027 , G05B19/042 , G06N3/008 , G06N3/045 , G06N3/08 , G05B2219/32335 , G05B2219/33033 , G05B2219/33034 , G05B2219/39001 , G05B2219/39298 , G05B2219/40499
Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.
-
-
-
-
-
-