Automated reinforcement-learning-based application manager that learns and improves a reward function
Abstract:
The current document is directed to automated reinforcement-learning-based application managers that learn and improve the reward function that steers reinforcement-learning-based systems towards optimal or near-optimal policies. Initially, when the automated reinforcement-learning-based application manager is first installed and launched, the automated reinforcement-learning-based application manager may rely on human-application-manager action inputs and resulting state/action trajectories to accumulate sufficient information to generate an initial reward function. During subsequent operation, when it is determined that the automated reinforcement-learning-based application manager is no longer following a policy consistent with the type of management desired by human application managers, the automated reinforcement-learning-based application manager may use accumulated trajectories to improve the reward function.
Information query
Patent Agency Ranking
0/0