-
公开(公告)号:US20220410380A1
公开(公告)日:2022-12-29
申请号:US17843288
申请日:2022-06-17
Applicant: X Development LLC
Inventor: Yao Lu , Mengyuan Yan , Seyed Mohammad Khansari Zadeh , Alexander Herzog , Eric Jang , Karol Hausman , Yevgen Chebotar , Sergey Levine , Alexander Irpan
IPC: B25J9/16
Abstract: Utilizing an initial set of offline positive-only robotic demonstration data for pre-training an actor network and a critic network for robotic control, followed by further training of the networks based on online robotic episodes that utilize the network(s). Implementations enable the actor network to be effectively pre-trained, while mitigating occurrences of and/or the extent of forgetting when further trained based on episode data. Implementations additionally or alternatively enable the actor network to be trained to a given degree of effectiveness in fewer training steps. In various implementations, one or more adaptation techniques are utilized in performing the robotic episodes and/or in performing the robotic training. The adaptation techniques can each, individually, result in one or more corresponding advantages and, when used in any combination, the corresponding advantages can accumulate. The adaptation techniques include Positive Sample Filtering, Adaptive Exploration, Using Max Q Values, and Using the Actor in CEM.