Abstract:
The current document is directed to automated reinforcement-learning-based application managers that learn and improve the reward function that steers reinforcement-learning-based systems towards optimal or near-optimal policies. Initially, when the automated reinforcement-learning-based application manager is first installed and launched, the automated reinforcement-learning-based application manager may rely on human-application-manager action inputs and resulting state/action trajectories to accumulate sufficient information to generate an initial reward function. During subsequent operation, when it is determined that the automated reinforcement-learning-based application manager is no longer following a policy consistent with the type of management desired by human application managers, the automated reinforcement-learning-based application manager may use accumulated trajectories to improve the reward function.
Abstract:
The current document is directed to an administrator-monitored reinforcement-learning-based application manager that can be deployed in various different computational environments to manage the computational environments with respect to one or more reward-specified goals. Certain control actions undertaken by the administrator-monitored reinforcement-learning-based application manager are first proposed, to one or more administrators or other users, who can accept or reject the proposed control actions prior to their execution. The reinforcement-learning-based application manager can therefore continue to explore the state/action space, but the exploration can be parametrically constrained as well as by human-administrator oversight and intervention.
Abstract:
This disclosure is directed to data-agnostic computational methods and systems for adjusting hard thresholds based on user feedback. Hard thresholds are used to monitor time-series data generated by a data-generating entity. The time-series data may be metric data that represents usage of the data-generating entity over time. The data is compared with a hard threshold associated with usage of the resource or process and when the data violates the threshold, an alert is typically generated and presented to a user. Methods and systems collect user feedback after a number of alerts to determine the quality and significance of the alerts. Based on the user feedback, methods and systems automatically adjust the hard thresholds to better represent how the user perceives the alerts.