Optimal sequential decision making with changing action space
Abstract:
Systems and methods for machine learning are described. Embodiments of the present disclosure receive state information that describes a state of a decision making agent in an environment; compute an action vector from an action embedding space based on the state information using a policy neural network of the decision making agent, wherein the policy neural network is trained using reinforcement learning based on a topology loss that constrains changes in a mapping between an action set and the action embedding space; and perform an action that modifies the state of the decision making agent in the environment based on the action vector, wherein the action is selected based on the mapping.
Public/Granted literature
Information query
Patent Agency Ranking
0/0