Patent search ap:("Google LLC") AND inv:"Shixiang Gu" Page 2

11.

发明授权
Data-efficient hierarchical reinforcement learning 有权

公开(公告)号：US11992944B2

公开(公告)日：2024-05-28

申请号：US17050546

申请日：2019-05-17

Applicant: Google LLC

Inventor： Honglak Lee , Shixiang Gu , Sergey Levine

IPC: B25J9/16

CPC classification number: B25J9/163

Abstract: Training and/or utilizing a hierarchical reinforcement learning (HRL) model for robotic control. The HRL model can include at least a higher-level policy model and a lower-level policy model. Some implementations relate to technique(s) that enable more efficient off-policy training to be utilized in training of the higher-level policy model and/or the lower-level policy model. Some of those implementations utilize off-policy correction, which re-labels higher-level actions of experience data, generated in the past utilizing a previously trained version of the HRL model, with modified higher-level actions. The modified higher-level actions are then utilized to off-policy train the higher-level policy model. This can enable effective off-policy training despite the lower-level policy model being a different version at training time (relative to the version when the experience data was collected).

12.

发明申请
DATA-EFFICIENT HIERARCHICAL REINFORCEMENT LEARNING 有权

公开(公告)号：US20210187733A1

公开(公告)日：2021-06-24

申请号：US17050546

申请日：2019-05-17

Applicant: Google LLC

Inventor： Honglak Lee , Shixiang Gu , Sergey Levine

IPC: B25J9/16

Abstract: Training and/or utilizing a hierarchical reinforcement learning (HRL) model for robotic control. The HRL model can include at least a higher-level policy model and a lower-level policy model. Some implementations relate to technique(s) that enable more efficient off-policy training to be utilized in training of the higher-level policy model and/or the lower-level policy model. Some of those implementations utilize off-policy correction, which re-labels higher-level actions of experience data, generated in the past utilizing a previously trained version of the HRL model, with modified higher-level actions. The modified higher-level actions are then utilized to off-policy train the higher-level policy model. This can enable effective off-policy training despite the lower-level policy model being a different version at training time (relative to the version when the experience data was collected).

Patent Agency Ranking