-
1.
公开(公告)号:EP4220487A3
公开(公告)日:2024-02-14
申请号:EP23163801.6
申请日:2023-03-23
发明人: ZHANG, Weijia , ZHANG, Le , LIU, Hao , HAN, Jindong , QIN, Chuan , ZHU, Hengshu , XIONG, Hui
IPC分类号: G06N3/045 , G06N3/0895 , G06N3/092 , B60L53/00 , G06Q50/06 , G06N3/0464 , G06N3/084
摘要: A method and apparatus for training an information adjustment model of a charging station, an electronic device, and a storage medium. An implementation comprises: acquiring a battery charging request, and determining environment state information corresponding to each charging station in a charging station set; determining, through an initial policy network, target operational information of the each charging station in the charging station set for the battery charging request, according to the environment state information; determining, through an initial value network, a cumulative reward expectation corresponding to the battery charging request according to the environment state information and the target operational information; training the initial policy network and the initial value network by using a deep deterministic policy gradient algorithm; and determining a trained policy network as an information adjustment model corresponding to the each charging station.
-
2.
公开(公告)号:EP4220487A2
公开(公告)日:2023-08-02
申请号:EP23163801.6
申请日:2023-03-23
发明人: ZHANG, Weijia , ZHANG, Le , LIU, Hao , HAN, Jindong , QIN, Chuan , ZHU, Hengshu , XIONG, Hui
IPC分类号: G06N3/045 , G06N3/0895 , G06N3/092 , B60L53/00 , G06Q50/06 , G06N3/0464 , G06N3/084
摘要: A method and apparatus for training an information adjustment model of a charging station, an electronic device, and a storage medium. An implementation comprises: acquiring a battery charging request, and determining environment state information corresponding to each charging station in a charging station set; determining, through an initial policy network, target operational information of the each charging station in the charging station set for the battery charging request, according to the environment state information; determining, through an initial value network, a cumulative reward expectation corresponding to the battery charging request according to the environment state information and the target operational information; training the initial policy network and the initial value network by using a deep deterministic policy gradient algorithm; and determining a trained policy network as an information adjustment model corresponding to the each charging station.
-