Lifelong learning with a changing action set
Abstract:
Systems and methods are described for a decision-making process that includes an increasing set of actions, compute a policy function for a Markov decision process (MDP) for the decision-making process, wherein the policy function is computed based on a state conditional function mapping states into an embedding space, an inverse dynamics function mapping state transitions into the embedding space, and an action selection function mapping the elements of the embedding space to actions, identify an additional set of actions in the increasing set of actions, update the inverse dynamics function based at least in part on the additional set of actions, update the policy function based on the updated inverse dynamics function and parameters learned during the computing the policy function, and select an action based on the updated policy function.
Public/Granted literature
Information query
Patent Agency Ranking
0/0