Patent search ap:("Google LLC") AND inv:"Shixiang Gu" Page 1

1.

发明授权
Deep reinforcement learning for robotic manipulation 有权

公开(公告)号：US11845183B2

公开(公告)日：2023-12-19

申请号：US17878186

申请日：2022-08-01

Applicant: Google LLC

Inventor： Sergey Levine , Ethan Holly , Shixiang Gu , Timothy Lillicrap

IPC: G06F17/00 , B25J9/16 , G05B13/02 , G06N3/08 , G06N3/008 , G06N3/045 , G05B19/042

CPC classification number: B25J9/161 , B25J9/163 , B25J9/1664 , G05B13/027 , G05B19/042 , G06N3/008 , G06N3/045 , G06N3/08 , G05B2219/32335 , G05B2219/33033 , G05B2219/33034 , G05B2219/39001 , G05B2219/39298 , G05B2219/40499

Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

2.

发明申请
REINFORCEMENT LEARNING USING ADVANTAGE ESTIMATES 有权

公开(公告)号：US20220284266A1

公开(公告)日：2022-09-08

申请号：US17704721

申请日：2022-03-25

Applicant: Google LLC

Inventor： Shixiang Gu , Timothy Paul Lillicrap , Ilya Sutskever , Sergey Vladimir Levine

IPC: G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for computing Q values for actions to be performed by an agent interacting with an environment from a continuous action space of actions. In one aspect, a system includes a value subnetwork configured to receive an observation characterizing a current state of the environment and process the observation to generate a value estimate; a policy subnetwork configured to receive the observation and process the observation to generate an ideal point in the continuous action space; and a subsystem configured to receive a particular point in the continuous action space representing a particular action; generate an advantage estimate for the particular action; and generate a Q value for the particular action that is an estimate of an expected return resulting from the agent performing the particular action when the environment is in the current state.

3.

发明公开
DEEP REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATION 审中-公开

公开(公告)号：US20240131695A1

公开(公告)日：2024-04-25

申请号：US18526443

申请日：2023-12-01

Applicant: GOOGLE LLC

Inventor： Sergey Levine , Ethan Holly , Shixiang Gu , Timothy Lillicrap

IPC: B25J9/16 , G05B13/02 , G05B19/042 , G06N3/008 , G06N3/045 , G06N3/08

CPC classification number: B25J9/161 , B25J9/163 , B25J9/1664 , G05B13/027 , G05B19/042 , G06N3/008 , G06N3/045 , G06N3/08 , G05B2219/32335 , G05B2219/33033 , G05B2219/33034 , G05B2219/39001 , G05B2219/39298 , G05B2219/40499

Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

4.

发明授权
Reinforcement learning using advantage estimates 有权

公开(公告)号：US11288568B2

公开(公告)日：2022-03-29

申请号：US15429088

申请日：2017-02-09

Applicant: Google LLC

Inventor： Shixiang Gu , Timothy Paul Lillicrap , Ilya Sutskever , Sergey Vladimir Levine

IPC: G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for computing Q values for actions to be performed by an agent interacting with an environment from a continuous action space of actions. In one aspect, a system includes a value subnetwork configured to receive an observation characterizing a current state of the environment and process the observation to generate a value estimate; a policy subnetwork configured to receive the observation and process the observation to generate an ideal point in the continuous action space; and a subsystem configured to receive a particular point in the continuous action space representing a particular action; generate an advantage estimate for the particular action; and generate a Q value for the particular action that is an estimate of an expected return resulting from the agent performing the particular action when the environment is in the current state.

5.

发明公开
DATA-EFFICIENT HIERARCHICAL REINFORCEMENT LEARNING 审中-公开

公开(公告)号：US20240308068A1

公开(公告)日：2024-09-19

申请号：US18673510

申请日：2024-05-24

Applicant: GOOGLE LLC

Inventor： Honglak Lee , Shixiang Gu , Sergey Levine

IPC: B25J9/16

CPC classification number: B25J9/163

Abstract: Training and/or utilizing a hierarchical reinforcement learning (HRL) model for robotic control. The HRL model can include at least a higher-level policy model and a lower-level policy model. Some implementations relate to technique(s) that enable more efficient off-policy training to be utilized in training of the higher-level policy model and/or the lower-level policy model. Some of those implementations utilize off-policy correction, which re-labels higher-level actions of experience data, generated in the past utilizing a previously trained version of the HRL model, with modified higher-level actions. The modified higher-level actions are then utilized to off-policy train the higher-level policy model. This can enable effective off-policy training despite the lower-level policy model being a different version at training time (relative to the version when the experience data was collected).

6.

发明授权
Deep reinforcement learning for robotic manipulation 有权

公开(公告)号：US11897133B2

公开(公告)日：2024-02-13

申请号：US17878186

申请日：2022-08-01

Applicant: Google LLC

Inventor： Sergey Levine , Ethan Holly , Shixiang Gu , Timothy Lillicrap

IPC: G06F17/00 , B25J9/16 , G05B13/02 , G06N3/08 , G06N3/008 , G06N3/045 , G05B19/042

CPC classification number: B25J9/161 , B25J9/163 , B25J9/1664 , G05B13/027 , G05B19/042 , G06N3/008 , G06N3/045 , G06N3/08 , G05B2219/32335 , G05B2219/33033 , G05B2219/33034 , G05B2219/39001 , G05B2219/39298 , G05B2219/40499

Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

7.

发明申请
DEEP REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATION 有权

公开(公告)号：US20220388159A1

公开(公告)日：2022-12-08

申请号：US17878186

申请日：2022-08-01

Applicant: Google LLC

Inventor： Sergey Levine , Ethan Holly , Shixiang Gu , Timothy Lillicrap

IPC: B25J9/16 , G05B13/02 , G06N3/08 , G06N3/00 , G06N3/04 , G05B19/042

Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

8.

发明申请
DEEP REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATION 审中-公开

公开(公告)号：US20190232488A1

公开(公告)日：2019-08-01

申请号：US16333482

申请日：2017-09-14

Applicant: Google LLC

Inventor： Sergey Levine , Ethan Holly , Shixiang Gu , Timothy Lillicrap

IPC: B25J9/16 , G05B13/02 , G05B19/042

CPC classification number: B25J9/161 , B25J9/163 , B25J9/1664 , G05B13/027 , G05B19/042 , G05B2219/32335 , G05B2219/33033 , G05B2219/33034 , G05B2219/39001 , G05B2219/39298 , G05B2219/40499 , G06N3/008 , G06N3/0454 , G06N3/08

Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

9.

发明授权
Deep reinforcement learning for robotic manipulation 有权

公开(公告)号：US12240113B2

公开(公告)日：2025-03-04

申请号：US18526443

申请日：2023-12-01

Applicant: GOOGLE LLC

Inventor： Sergey Levine , Ethan Holly , Shixiang Gu , Timothy Lillicrap

IPC: G06F17/00 , B25J9/16 , G05B13/02 , G05B19/042 , G06N3/008 , G06N3/045 , G06N3/08

Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

10.

发明授权
Data-efficient hierarchical reinforcement learning 有权

公开(公告)号：US11992944B2

公开(公告)日：2024-05-28

申请号：US17050546

申请日：2019-05-17

Applicant: Google LLC

Inventor： Honglak Lee , Shixiang Gu , Sergey Levine

IPC: B25J9/16

CPC classification number: B25J9/163

Abstract: Training and/or utilizing a hierarchical reinforcement learning (HRL) model for robotic control. The HRL model can include at least a higher-level policy model and a lower-level policy model. Some implementations relate to technique(s) that enable more efficient off-policy training to be utilized in training of the higher-level policy model and/or the lower-level policy model. Some of those implementations utilize off-policy correction, which re-labels higher-level actions of experience data, generated in the past utilizing a previously trained version of the HRL model, with modified higher-level actions. The modified higher-level actions are then utilized to off-policy train the higher-level policy model. This can enable effective off-policy training despite the lower-level policy model being a different version at training time (relative to the version when the experience data was collected).

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification