Patent search ap:("X Development LLC") AND inv:"Alexander Herzog" Page 1

1.

发明申请
LEARNING ROBOTIC SKILLS WITH IMITATION AND REINFORCEMENT AT SCALE 有权

公开(公告)号：US20220410380A1

公开(公告)日：2022-12-29

申请号：US17843288

申请日：2022-06-17

Applicant: X Development LLC

Inventor： Yao Lu , Mengyuan Yan , Seyed Mohammad Khansari Zadeh , Alexander Herzog , Eric Jang , Karol Hausman , Yevgen Chebotar , Sergey Levine , Alexander Irpan

IPC: B25J9/16

Abstract: Utilizing an initial set of offline positive-only robotic demonstration data for pre-training an actor network and a critic network for robotic control, followed by further training of the networks based on online robotic episodes that utilize the network(s). Implementations enable the actor network to be effectively pre-trained, while mitigating occurrences of and/or the extent of forgetting when further trained based on episode data. Implementations additionally or alternatively enable the actor network to be trained to a given degree of effectiveness in fewer training steps. In various implementations, one or more adaptation techniques are utilized in performing the robotic episodes and/or in performing the robotic training. The adaptation techniques can each, individually, result in one or more corresponding advantages and, when used in any combination, the corresponding advantages can accumulate. The adaptation techniques include Positive Sample Filtering, Adaptive Exploration, Using Max Q Values, and Using the Actor in CEM.

2.

发明授权
Asynchronous robotic control using most recently selected robotic action data 有权

公开(公告)号：US11685045B1

公开(公告)日：2023-06-27

申请号：US16948187

申请日：2020-09-08

Applicant: X Development LLC

Inventor： Alexander Herzog , Dmitry Kalashnikov , Julian Ibarz

IPC: B25J9/16

CPC classification number: B25J9/161 , B25J9/163 , B25J9/1661 , B25J9/1669 , B25J9/1697

Abstract: Asynchronous robotic control utilizing a trained critic network. During performance of a robotic task based on a sequence of robotic actions determined utilizing the critic network, a corresponding next robotic action of the sequence is determined while a corresponding previous robotic action of the sequence is still being implemented. Optionally, the next robotic action can be fully determined and/or can begin to be implemented before implementation of the previous robotic action is completed. In determining the next robotic action, most recently selected robotic action data is processed using the critic network, where such data conveys information about the previous robotic action that is still being implemented. Some implementations additionally or alternatively relate to determining when to implement a robotic action that is determined in an asynchronous manner.

3.

发明申请
UTILIZING PAST CONTACT PHYSICS IN ROBOTIC MANIPULATION (E.G., PUSHING) OF AN OBJECT 有权

公开(公告)号：US20220134546A1

公开(公告)日：2022-05-05

申请号：US17515490

申请日：2021-10-31

Applicant: X Development LLC

Inventor： Zhuo Xu , Wenhao Yu , Alexander Herzog , Wenlong Lu , Chuyuan Fu , Yunfei Bai , C. Karen Liu , Daniel Ho

IPC: B25J9/16 , B25J13/08 , B25J19/02

Abstract: Utilization of past dynamics sample(s), that reflect past contact physics information, in training and/or utilizing a neural network model. The neural network model represents a learned value function (e.g., a Q-value function) and that, when trained, can be used in selecting a sequence of robotic actions to implement in robotic manipulation (e.g., pushing) of an object by a robot. In various implementations, a past dynamics sample for an episode of robotic manipulation can include at least two past images from the episode, as well as one or more past force sensor readings that temporally correspond to the past images from the episode.

4.

发明授权
Generating reinforcement learning data that is compatible with reinforcement learning for a robotic task 有权

公开(公告)号：US11610153B1

公开(公告)日：2023-03-21

申请号：US16729712

申请日：2019-12-30

Applicant: X Development LLC

Inventor： Alexander Herzog , Adrian Li , Mrinal Kalakrishnan , Benjamin Holson

IPC: G06N20/00 , B25J9/16 , G06F30/27

Abstract: Utilizing at least one existing policy (e.g. a manually engineered policy) for a robotic task, in generating reinforcement learning (RL) data that can be used in training an RL policy for an instance of RL of the robotic task. The existing policy can be one that, standing alone, will not generate data that is compatible with the instance of RL for the robotic task. In contrast, the generated RL data is compatible with RL for the robotic task at least by virtue of it including state data that is in a state space of the RL for the robotic task, and including actions that are in the action space of the RL for the robotic task. The generated RL data can be used in at least some of the initial training for the RL policy using reinforcement learning.

5.

发明申请
TRAINING A POLICY MODEL FOR A ROBOTIC TASK, USING REINFORCEMENT LEARNING AND UTILIZING DATA THAT IS BASED ON EPISODES, OF THE ROBOTIC TASK, GUIDED BY AN ENGINEERED POLICY 有权

公开(公告)号：US20220245503A1

公开(公告)日：2022-08-04

申请号：US17161845

申请日：2021-01-29

Applicant: X Development LLC

Inventor： Adrian Li , Benjamin Holson , Alexander Herzog , Mrinal Kalakrishnan

IPC: G06N20/00 , G06N3/00 , G06N5/04

Abstract: Implementations disclosed herein relate to utilizing at least one existing manually engineered policy, for a robotic task, in training an RL policy model that can be used to at least selectively replace a portion of the engineered policy. The RL policy model can be trained for replacing a portion of a robotic task and can be trained based on data from episodes of attempting performance of the robotic task, including episodes in which the portion is performed based on the engineered policy and/or other portion(s) are performed based on the engineered policy. Once trained, the RL policy model can be used, at least selectively and in lieu of utilization of the engineered policy, to perform the portion of robotic task, while other portion(s) of the robotic task are performed utilizing the engineered policy and/or other similarly trained (but distinct) RL policy model(s).

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification