-
公开(公告)号:US20220188583A1
公开(公告)日:2022-06-16
申请号:US17686113
申请日:2022-03-03
Applicant: Huawei Technologies Co., Ltd.
Inventor: Yaodong YANG , Rasul TUTUNOV , Phu SAKULWONGTANA , Haitham BOU AMMAR , Jun WANG
IPC: G06K9/62
Abstract: A computer-implemented policy evaluation system for identifying an optimal combination of operational policies for implementation by a plurality of actors, each actor being capable of adopting any of a plurality of operational policies, the policy evaluation system being configured to iteratively perform the following steps: (i) selecting a first combination of operational policies, the first combination defining a policy for each actor; (ii) receiving vectors of values, each representing the benefit of a respective second combination where all but one of the actors adopts the policy defined for it in the first combination and that actor adopts a different policy; and (iii) estimating a ranking of combinations of policies for the actors in dependence on those values and a previously estimated ranking of combinations of policies for the actors.
-
公开(公告)号:US20230385611A1
公开(公告)日:2023-11-30
申请号:US18364601
申请日:2023-08-03
Applicant: HUAWEI TECHNOLOGIES CO., LTD.
Inventor: Vincent MOENS , Hugues VAN ASSEL , Haitham BOU AMMAR
IPC: G06N3/047
CPC classification number: G06N3/047
Abstract: An apparatus for training a parametric policy in dependence on a proposal distribution, the apparatus comprising one or more processors configured to repeatedly perform the steps of: forming, in dependence on the proposal distribution, a proposal; inputting the proposal to the policy so as to form an output state from the policy responsive to the proposal; estimating a loss between the output state and a preferred state responsive to the proposal; forming, by means of an adaptation algorithm and in dependence on the loss, a policy adaption; applying the policy adaption to the policy to form an adapted policy; forming, by means of the adapted policy, an estimate of variance in the policy adaptation and adapting the proposal distribution in dependence on the estimate of variance so as to reduce the variance of policy adaptations formed on subsequent iterations of the steps.
-