-
公开(公告)号:US11709462B2
公开(公告)日:2023-07-25
申请号:US15894688
申请日:2018-02-12
Applicant: ADOBE INC.
Inventor: Haoxiang Li , Yinan Zhang
CPC classification number: G05B13/027 , G05B17/02 , G06N3/08 , G06N20/00
Abstract: The training of a learning agent to provide real-time control of an object is disclosed. Training of the learning agent and training of a corresponding pioneer agent are iteratively alternated. The training of the learning and pioneer agents is under the supervision of a supervisor agent. The training of the learning agent provides feedback for subsequent training of the pioneer agent. The training of the pioneer agent provides feedback for subsequent training of the learning agent. During the training, a supervisor coefficient modulates the influence of the supervisor agent. As agents are trained, the influence of the supervisor agent is decayed. The training of the learning agent, under a first level of supervisor influence, includes real-time control of the object. The subsequent training of the pioneer agent, under a reduced level of supervisor influence, includes replay of training data accumulated during the real-time control of the object.
-
公开(公告)号:US20190250568A1
公开(公告)日:2019-08-15
申请号:US15894688
申请日:2018-02-12
Applicant: ADOBE INC.
Inventor: Haoxiang Li , Yinan Zhang
CPC classification number: G05B13/027 , G05B17/02 , G06N3/08 , G06N20/00
Abstract: The training of a learning agent to provide real-time control of an object is disclosed. Training of the learning agent and training of a corresponding pioneer agent are iteratively alternated. The training of the learning and pioneer agents is under the supervision of a supervisor agent. The training of the learning agent provides feedback for subsequent training of the pioneer agent. The training of the pioneer agent provides feedback for subsequent training of the learning agent. During the training, a supervisor coefficient modulates the influence of the supervisor agent. As agents are trained, the influence of the supervisor agent is decayed. The training of the learning agent, under a first level of supervisor influence, includes real-time control of the object. The subsequent training of the pioneer agent, under a reduced level of supervisor influence, includes replay of training data accumulated during the real-time control of the object.
-