Safe and efficient training of a control agent

    公开(公告)号:US11709462B2

    公开(公告)日:2023-07-25

    申请号:US15894688

    申请日:2018-02-12

    Applicant: ADOBE INC.

    CPC classification number: G05B13/027 G05B17/02 G06N3/08 G06N20/00

    Abstract: The training of a learning agent to provide real-time control of an object is disclosed. Training of the learning agent and training of a corresponding pioneer agent are iteratively alternated. The training of the learning and pioneer agents is under the supervision of a supervisor agent. The training of the learning agent provides feedback for subsequent training of the pioneer agent. The training of the pioneer agent provides feedback for subsequent training of the learning agent. During the training, a supervisor coefficient modulates the influence of the supervisor agent. As agents are trained, the influence of the supervisor agent is decayed. The training of the learning agent, under a first level of supervisor influence, includes real-time control of the object. The subsequent training of the pioneer agent, under a reduced level of supervisor influence, includes replay of training data accumulated during the real-time control of the object.

    SAFE AND EFFICIENT TRAINING OF A CONTROL AGENT

    公开(公告)号:US20190250568A1

    公开(公告)日:2019-08-15

    申请号:US15894688

    申请日:2018-02-12

    Applicant: ADOBE INC.

    CPC classification number: G05B13/027 G05B17/02 G06N3/08 G06N20/00

    Abstract: The training of a learning agent to provide real-time control of an object is disclosed. Training of the learning agent and training of a corresponding pioneer agent are iteratively alternated. The training of the learning and pioneer agents is under the supervision of a supervisor agent. The training of the learning agent provides feedback for subsequent training of the pioneer agent. The training of the pioneer agent provides feedback for subsequent training of the learning agent. During the training, a supervisor coefficient modulates the influence of the supervisor agent. As agents are trained, the influence of the supervisor agent is decayed. The training of the learning agent, under a first level of supervisor influence, includes real-time control of the object. The subsequent training of the pioneer agent, under a reduced level of supervisor influence, includes replay of training data accumulated during the real-time control of the object.

Patent Agency Ranking