Generating reinforcement learning data that is compatible with reinforcement learning for a robotic task

    公开(公告)号:US11610153B1

    公开(公告)日:2023-03-21

    申请号:US16729712

    申请日:2019-12-30

    Abstract: Utilizing at least one existing policy (e.g. a manually engineered policy) for a robotic task, in generating reinforcement learning (RL) data that can be used in training an RL policy for an instance of RL of the robotic task. The existing policy can be one that, standing alone, will not generate data that is compatible with the instance of RL for the robotic task. In contrast, the generated RL data is compatible with RL for the robotic task at least by virtue of it including state data that is in a state space of the RL for the robotic task, and including actions that are in the action space of the RL for the robotic task. The generated RL data can be used in at least some of the initial training for the RL policy using reinforcement learning.

    Action prediction networks for robotic grasping

    公开(公告)号:US11325252B2

    公开(公告)日:2022-05-10

    申请号:US16570522

    申请日:2019-09-13

    Abstract: Deep machine learning methods and apparatus related to the manipulation of an object by an end effector of a robot are described herein. Some implementations relate to training an action prediction network to predict a probability density which can include candidate actions of successful grasps by the end effector given an input image. Some implementations are directed to utilization of an action prediction network to visually servo a grasping end effector of a robot to achieve a successful grasp of an object by the grasping end effector.

    ACTION PREDICTION NETWORKS FOR ROBOTIC GRASPING

    公开(公告)号:US20200086483A1

    公开(公告)日:2020-03-19

    申请号:US16570522

    申请日:2019-09-13

    Abstract: Deep machine learning methods and apparatus related to the manipulation of an object by an end effector of a robot are described herein. Some implementations relate to training an action prediction network to predict a probability density which can include candidate actions of successful grasps by the end effector given an input image. Some implementations are directed to utilization of an action prediction network to visually servo a grasping end effector of a robot to achieve a successful grasp of an object by the grasping end effector.

    Control policies for collective robot learning

    公开(公告)号:US11188821B1

    公开(公告)日:2021-11-30

    申请号:US15705601

    申请日:2017-09-15

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, of training a global policy neural network. One of the methods includes initializing an instance of the robotic task for multiple local workers, generating a trajectory of state-action pairs by selecting actions to be performed by the robotic agent while performing the instance of the robotic task, optimizing a local policy controller on the trajectory, generating an optimized trajectory using the optimized local controller, and storing the optimized trajectory in a replay memory associated with the local worker. The method includes sampling, for multiple global workers, an optimized trajectory from one of one or more replay memories associated with the global worker, and training the replica of the global policy neural network maintained by the global worker on the sampled optimized trajectory to determine delta values for the parameters of the global policy neural network.

    MACHINE LEARNING METHODS AND APPARATUS FOR AUTOMATED ROBOTIC PLACEMENT OF SECURED OBJECT IN APPROPRIATE LOCATION

    公开(公告)号:US20210229276A1

    公开(公告)日:2021-07-29

    申请号:US17230628

    申请日:2021-04-14

    Abstract: Training and/or use of a machine learning model for placement of an object secured by an end effector of a robot. A trained machine learning model can be used to process: (1) a current image, captured by a vision component of a robot, that captures an end effector securing an object; (2) a candidate end effector action that defines a candidate motion of the end effector; and (3) a target placement input that indicates a target placement location for the object. Based on the processing, a prediction can be generated that indicates likelihood of successful placement of the object in the target placement location with application of the motion defined by the candidate end effector action. At many iterations, the candidate end effector action with the highest probability is selected and control commands provided to cause the end effector to move in conformance with the corresponding end effector action. When at least one release criteria is satisfied, control commands can be provided to cause the end effector to release the object, thereby leading to the object being placed in the target placement location.

Patent Agency Ranking