APPARATUS AND METHOD FOR TRAINING  PARAMETRIC POLICY

    公开(公告)号:US20230385611A1

    公开(公告)日:2023-11-30

    申请号:US18364601

    申请日:2023-08-03

    CPC classification number: G06N3/047

    Abstract: An apparatus for training a parametric policy in dependence on a proposal distribution, the apparatus comprising one or more processors configured to repeatedly perform the steps of: forming, in dependence on the proposal distribution, a proposal; inputting the proposal to the policy so as to form an output state from the policy responsive to the proposal; estimating a loss between the output state and a preferred state responsive to the proposal; forming, by means of an adaptation algorithm and in dependence on the loss, a policy adaption; applying the policy adaption to the policy to form an adapted policy; forming, by means of the adapted policy, an estimate of variance in the policy adaptation and adapting the proposal distribution in dependence on the estimate of variance so as to reduce the variance of policy adaptations formed on subsequent iterations of the steps.

Patent Agency Ranking