Abstract:
The current document is directed to methods and systems that collect metric data within computing facilities, including large data centers and cloud-computing facilities. In a described implementation, two or more metric-data sets are combined to generate a multidimensional metric-data set. The multidimensional metric-data set is compressed for efficient storage by clustering the multidimensional data points within the multidimensional metric-data set to produce a covering subset of multidimensional data points and by then representing the multidimensional-data-point members of each cluster by a cluster identifier rather than by a set of floating-point values, integer values, or other types of data representations. The covering set is constructed to ensure that the compression does not result in greater than a specified level of distortion of the original data.
Abstract:
Automated methods and systems to determine a baseline event-type distribution of an event source and use the baseline event type distribution to detect changes in the behavior of the event source are described. In one implementation, blocks of event messages generated by the event source are collected and an event-type distribution is computed for each of block of event messages. Candidate baseline event-type distributions are determined from the event-type distributions. The candidate baseline event-type distribution has the largest entropy of the event-type distributions. A normal discrepancy radius of the event-type distributions is computed from the baseline event-type distribution and the event-type distributions. A block of run-time event messages generated by the event source is collected. A run-time event-type distribution is computed from the block of run-time event messages. When the run-time event-type distribution is outside the normal discrepancy radius, an alert is generated indicating abnormal behavior of the event source.
Abstract:
This disclosure is directed to data-agnostic computational methods and systems for adjusting hard thresholds based on user feedback. Hard thresholds are used to monitor time-series data generated by a data-generating entity. The time-series data may be metric data that represents usage of the data-generating entity over time. The data is compared with a hard threshold associated with usage of the resource or process and when the data violates the threshold, an alert is typically generated and presented to a user. Methods and systems collect user feedback after a number of alerts to determine the quality and significance of the alerts. Based on the user feedback, methods and systems automatically adjust the hard thresholds to better represent how the user perceives the alerts.
Abstract:
Automated methods and systems to reduce the size of time series data while maintaining outlier data points are described. The time series data may be read from a data-storage device of a physical data center. Clusters of data points of the time series data are determined. A normalcy domain of the time series data and outlier data points of the time series data is determined. The normalcy domain of the time series data comprises ranges of values associated with each clusters of data points. The outlier data points are located outside the ranges. Quantized time series data are computed from the normalcy domain. When the loss of information due to quantization is less than a limit, the quantized time series data is compressed. The time series data in the data-storage device is replaced with the compressed time series data and outlier data points.
Abstract:
Automated methods and systems to determine a baseline event-type distribution of an event source and use the baseline event type distribution to detect changes in the behavior of the event source are described. In one implementation, blocks of event messages generated by the event source are collected and an event-type distribution is computed for each of block of event messages. Candidate baseline event-type distributions are determined from the event-type distributions. The candidate baseline event-type distribution has the largest entropy of the event-type distributions. A normal discrepancy radius of the event-type distributions is computed from the baseline event-type distribution and the event-type distributions. A block of run-time event messages generated by the event source is collected. A run-time event-type distribution is computed from the block of run-time event messages. When the run-time event-type distribution is outside the normal discrepancy radius, an alert is generated indicating abnormal behavior of the event source.
Abstract:
The current document is directed to methods and systems that collect metric data within computing facilities, including large data centers and cloud-computing facilities. In a described implementation, two or more metric-data sets are combined to generate a multidimensional metric-data set. The multidimensional metric-data set is compressed for efficient storage by clustering the multidimensional data points within the multidimensional metric-data set to produce a covering subset of multidimensional data points and by then representing the multidimensional-data-point members of each cluster by a cluster identifier rather than by a set of floating-point values, integer values, or other types of data representations. The covering set is constructed to ensure that the compression does not result in greater than a specified level of distortion of the original data.
Abstract:
Methods and systems quantize and compress time series data generated by a resource of a distributed computing system. The time series data is partitioned according to a set of quantiles. Quantized time series data is generated from the time series data and the quantiles. The quantized time series data is compressed by deleting sequential duplicate quantized data points from the quantized time series data to obtain compress time series data. Quantization and compression are performed for different combinations of quantiles. The user may choose to minimize information loss of information due to quantization while selecting a lower bound for the compression rate. Alternatively, the user may choose to maximize the compression rate while placing an upper limit on the loss of information due to quantization. The compressed time series data that satisfies the user selected optimization conditions may be used to replace the original time series data in the data-storage device.
Abstract:
Methods determine a capacity-forecast model based on historical capacity metric data and historical business metric data. The capacity-forecast model may be to estimate capacity requirements with respect to changes in demand for the data center customer's application program. The capacity-forecast model provides an analytical “what-if” approach to reallocating data center resources in order to satisfy projected business level expectations of a data center customer and calculate estimated capacities for different business scenarios.
Abstract:
A problem in a cloud infrastructure may be identified when a server computer deviates from a normal level of operation based on anomaly scores, which generates an alert and an alert time that indicates when the alert is generated. Methods then determine which virtual machine (“VM”) and other IT objects/resources or their pools contribute to the problem within a time window surrounding the estimated problem start time and calculate which objects show similar, related anomalous behavior. Method also generate ranked remediation recommendations on an object level and server computer-to-object level. The methods generate results that enable a system administrator to identify the start time of the problem and identify the objects that are responsible for the problem.
Abstract:
This disclosure is directed to computational, closed-loop user feedback systems and methods for ranking or updating beliefs for a user based on user feedback. The systems and methods are based on a data-agnostic user feedback formulation that uses user feedback to automatically rank beliefs for a user or update the beliefs. The methods and systems are based on a general statistical inference model, which, in turn, is based on an assumption of convergence in user opinion. The closed-loop user feedback methods and systems may be used to rank or update beliefs prior to inputting the beliefs to a recommender engine. As a result, the recommender engine is expected to be more responsive to customer environments and efficient at deployment and reducing the level of unnecessary user recommendations.