Patent search ap:("ADOBE INC.") AND inv:"Yinan Zhang" Page 1

1.

发明授权
Safe and efficient training of a control agent 有权

公开(公告)号：US11709462B2

公开(公告)日：2023-07-25

申请号：US15894688

申请日：2018-02-12

Applicant: ADOBE INC.

Inventor： Haoxiang Li , Yinan Zhang

IPC: G05B13/02 , G05B17/02 , G06N3/08 , G06N20/00

CPC classification number: G05B13/027 , G05B17/02 , G06N3/08 , G06N20/00

Abstract: The training of a learning agent to provide real-time control of an object is disclosed. Training of the learning agent and training of a corresponding pioneer agent are iteratively alternated. The training of the learning and pioneer agents is under the supervision of a supervisor agent. The training of the learning agent provides feedback for subsequent training of the pioneer agent. The training of the pioneer agent provides feedback for subsequent training of the learning agent. During the training, a supervisor coefficient modulates the influence of the supervisor agent. As agents are trained, the influence of the supervisor agent is decayed. The training of the learning agent, under a first level of supervisor influence, includes real-time control of the object. The subsequent training of the pioneer agent, under a reduced level of supervisor influence, includes replay of training data accumulated during the real-time control of the object.

2.

发明申请
SAFE AND EFFICIENT TRAINING OF A CONTROL AGENT 审中-公开

公开(公告)号：US20190250568A1

公开(公告)日：2019-08-15

申请号：US15894688

申请日：2018-02-12

Applicant: ADOBE INC.

Inventor： Haoxiang Li , Yinan Zhang

IPC: G05B13/02 , G06N3/08 , G06N99/00 , G05B17/02

CPC classification number: G05B13/027 , G05B17/02 , G06N3/08 , G06N20/00

Abstract: The training of a learning agent to provide real-time control of an object is disclosed. Training of the learning agent and training of a corresponding pioneer agent are iteratively alternated. The training of the learning and pioneer agents is under the supervision of a supervisor agent. The training of the learning agent provides feedback for subsequent training of the pioneer agent. The training of the pioneer agent provides feedback for subsequent training of the learning agent. During the training, a supervisor coefficient modulates the influence of the supervisor agent. As agents are trained, the influence of the supervisor agent is decayed. The training of the learning agent, under a first level of supervisor influence, includes real-time control of the object. The subsequent training of the pioneer agent, under a reduced level of supervisor influence, includes replay of training data accumulated during the real-time control of the object.

Patent Agency Ranking