REINFORCEMENT LEARNING WITH OPTIMIZATION-BASED POLICY

    公开(公告)号:US20230289612A1

    公开(公告)日:2023-09-14

    申请号:US18181048

    申请日:2023-03-09

    Inventor: Ming Jin

    CPC classification number: G06N3/092

    Abstract: Concepts of using optimization-based policy in reinforcement learning (RL) are described. In one example, a method can include implementing an RL agent in a subsystem of the sequential decision-making system. The RL agent can be coupled to a prediction module and an optimization module of the subsystem. The method can also include defining a parameter value of the optimization module based on an observed state of the subsystem and/or a reward provided to the prediction module based on the observed state. The method can also include learning a policy that is defined by the optimization module based on the parameter value and a predicted future state of the subsystem that is predicted by the prediction module based on the reward. The policy can include a suggested action to be performed by the subsystem to achieve a goal. The method can also include implementing the policy to perform the suggested action.

Patent Agency Ranking