LEARNING ROBOTIC SKILLS WITH IMITATION AND REINFORCEMENT AT SCALE

    公开(公告)号:US20220410380A1

    公开(公告)日:2022-12-29

    申请号:US17843288

    申请日:2022-06-17

    Abstract: Utilizing an initial set of offline positive-only robotic demonstration data for pre-training an actor network and a critic network for robotic control, followed by further training of the networks based on online robotic episodes that utilize the network(s). Implementations enable the actor network to be effectively pre-trained, while mitigating occurrences of and/or the extent of forgetting when further trained based on episode data. Implementations additionally or alternatively enable the actor network to be trained to a given degree of effectiveness in fewer training steps. In various implementations, one or more adaptation techniques are utilized in performing the robotic episodes and/or in performing the robotic training. The adaptation techniques can each, individually, result in one or more corresponding advantages and, when used in any combination, the corresponding advantages can accumulate. The adaptation techniques include Positive Sample Filtering, Adaptive Exploration, Using Max Q Values, and Using the Actor in CEM.

    Asynchronous robotic control using most recently selected robotic action data

    公开(公告)号:US11685045B1

    公开(公告)日:2023-06-27

    申请号:US16948187

    申请日:2020-09-08

    CPC classification number: B25J9/161 B25J9/163 B25J9/1661 B25J9/1669 B25J9/1697

    Abstract: Asynchronous robotic control utilizing a trained critic network. During performance of a robotic task based on a sequence of robotic actions determined utilizing the critic network, a corresponding next robotic action of the sequence is determined while a corresponding previous robotic action of the sequence is still being implemented. Optionally, the next robotic action can be fully determined and/or can begin to be implemented before implementation of the previous robotic action is completed. In determining the next robotic action, most recently selected robotic action data is processed using the critic network, where such data conveys information about the previous robotic action that is still being implemented. Some implementations additionally or alternatively relate to determining when to implement a robotic action that is determined in an asynchronous manner.

    Generating reinforcement learning data that is compatible with reinforcement learning for a robotic task

    公开(公告)号:US11610153B1

    公开(公告)日:2023-03-21

    申请号:US16729712

    申请日:2019-12-30

    Abstract: Utilizing at least one existing policy (e.g. a manually engineered policy) for a robotic task, in generating reinforcement learning (RL) data that can be used in training an RL policy for an instance of RL of the robotic task. The existing policy can be one that, standing alone, will not generate data that is compatible with the instance of RL for the robotic task. In contrast, the generated RL data is compatible with RL for the robotic task at least by virtue of it including state data that is in a state space of the RL for the robotic task, and including actions that are in the action space of the RL for the robotic task. The generated RL data can be used in at least some of the initial training for the RL policy using reinforcement learning.

Patent Agency Ranking