Patent search ap:("Huawei Technologies Co. Page Ltd.") AND inv:"Haitham BOU AMMAR"

1.

发明申请
LARGE-SCALE POLICY EVALUATION IN MULTI-AGENT SYSTEMS 有权

公开(公告)号：US20220188583A1

公开(公告)日：2022-06-16

申请号：US17686113

申请日：2022-03-03

Applicant: Huawei Technologies Co., Ltd.

Inventor： Yaodong YANG , Rasul TUTUNOV , Phu SAKULWONGTANA , Haitham BOU AMMAR , Jun WANG

IPC: G06K9/62

Abstract: A computer-implemented policy evaluation system for identifying an optimal combination of operational policies for implementation by a plurality of actors, each actor being capable of adopting any of a plurality of operational policies, the policy evaluation system being configured to iteratively perform the following steps: (i) selecting a first combination of operational policies, the first combination defining a policy for each actor; (ii) receiving vectors of values, each representing the benefit of a respective second combination where all but one of the actors adopts the policy defined for it in the first combination and that actor adopts a different policy; and (iii) estimating a ranking of combinations of policies for the actors in dependence on those values and a previously estimated ranking of combinations of policies for the actors.

2.

发明公开
APPARATUS AND METHOD FOR TRAINING PARAMETRIC POLICY 审中-公开

公开(公告)号：US20230385611A1

公开(公告)日：2023-11-30

申请号：US18364601

申请日：2023-08-03

Applicant: HUAWEI TECHNOLOGIES CO., LTD.

Inventor： Vincent MOENS , Hugues VAN ASSEL , Haitham BOU AMMAR

IPC: G06N3/047

CPC classification number: G06N3/047

Abstract: An apparatus for training a parametric policy in dependence on a proposal distribution, the apparatus comprising one or more processors configured to repeatedly perform the steps of: forming, in dependence on the proposal distribution, a proposal; inputting the proposal to the policy so as to form an output state from the policy responsive to the proposal; estimating a loss between the output state and a preferred state responsive to the proposal; forming, by means of an adaptation algorithm and in dependence on the loss, a policy adaption; applying the policy adaption to the policy to form an adapted policy; forming, by means of the adapted policy, an estimate of variance in the policy adaptation and adapting the proposal distribution in dependence on the estimate of variance so as to reduce the variance of policy adaptations formed on subsequent iterations of the steps.

Patent Agency Ranking