Abstract:
Automated methods and systems to reduce the size of time series data while maintaining outlier data points are described. The time series data may be read from a data-storage device of a physical data center. Clusters of data points of the time series data are determined. A normalcy domain of the time series data and outlier data points of the time series data is determined. The normalcy domain of the time series data comprises ranges of values associated with each clusters of data points. The outlier data points are located outside the ranges. Quantized time series data are computed from the normalcy domain. When the loss of information due to quantization is less than a limit, the quantized time series data is compressed. The time series data in the data-storage device is replaced with the compressed time series data and outlier data points.
Abstract:
Methods and systems are directed to detecting and classifying changes in a distributed computing system. Divergence value are computed from distributions of different types of event messages generated in time intervals of a sliding time window. Each divergence value is a measure of change in types of events generated in each time interval. When a divergence value, or a rate of change in divergence values, exceeds a threshold, the time interval associated with the threshold violation is used to determine a change point in the operation of the distributed computing system. Based on the change point, a start time of the change is determined. The change is classified based on various previously classified change points in the disturbed computing system. A recommendation may be generated to address the change based on the classification of the change.
Abstract:
This disclosure is directed to automated methods and systems for calculating hard thresholds used to monitor time-series data generated by data-generating entity. The methods are based on determining a cumulative distribution that characterizes the probability that data values of time-series data generated by the data-generating entity violate a hard threshold. The hard threshold is calculated as an inverse of the cumulative distribution based on a user defined risk confidence level. The hard threshold may then be used to generate alerts when time-series data generated later by the data-generating entity violate the hard threshold.
Abstract:
The current document is directed to methods and subsystems within computing systems, including distributed computing systems, that collect, store, process, and analyze population metrics for types and classes of system components, including components of distributed applications executing within containers, virtual machines, and other execution environments. In a described implementation, a graph-like representation of the configuration and state of a computer system included aggregation nodes that collect metric data for a set of multiple object nodes and that collect metric data that represents the members of the set over a monitoring time interval. Population metrics are monitored, in certain implementations, to detect outlier members of an aggregation.
Abstract:
Methods recommend to data center customers those attributes of a data center infrastructure and application program that are associated with service-level objective (“SLO”) metric degradation and may be recorded in problem definitions. In other words, a data center customer is offered to “codify” problems primarily with atomic abnormality conditions on indicated attributes that decrease the SLO by some degree that the data center customer would like to be aware. As a result, the data center customer is warned of potentially significant SLO decline in order to prevent unwanted loss and take any necessary actions to prevent active anomalies. Methods also generate patterns of attributes that constitute core structures highly associated with degradation of the SLO metric.
Abstract:
This disclosure is directed to data-agnostic computational methods and systems for adjusting hard thresholds based on user feedback. Hard thresholds are used to monitor time-series data generated by a data-generating entity. The time-series data may be metric data that represents usage of the data-generating entity over time. The data is compared with a hard threshold associated with usage of the resource or process and when the data violates the threshold, an alert is typically generated and presented to a user. Methods and systems collect user feedback after a number of alerts to determine the quality and significance of the alerts. Based on the user feedback, methods and systems automatically adjust the hard thresholds to better represent how the user perceives the alerts.
Abstract:
The current document is directed to a multi-stage metric-data compression method and subsystem for compressing metric data collected and stored within distributed computing systems to facilitate computer-system management and administration. In a described implementation, metric data is partitioned into constant metric data, low-variability metric data, and high-variability metric data. High-variability metric data is compressed by identifying a set of basis metrics, or independent metrics, with respect to which a remaining set of dependent metrics can be expressed using coefficient multipliers. The high-variability metric data can then be stored as a set of independent metrics and set of coefficients, along with a small amount of additional data.
Abstract:
This disclosure is directed to data-agnostic computational methods and systems for adjusting hard thresholds based on user feedback. Hard thresholds are used to monitor time-series data generated by a data-generating entity. The time-series data may be metric data that represents usage of the data-generating entity over time. The data is compared with a hard threshold associated with usage of the resource or process and when the data violates the threshold, an alert is typically generated and presented to a user. Methods and systems collect user feedback after a number of alerts to determine the quality and significance of the alerts. Based on the user feedback, methods and systems automatically adjust the hard thresholds to better represent how the user perceives the alerts.
Abstract:
Methods and systems that manage large volumes of metric data generation by cloud-computing infrastructures are described. The cloud-computing infrastructure generates sets of metric data, each set of metric data may represent usage or performance of an application or application module run by the cloud-computing infrastructure or may represent use or performance of cloud-computing resources used by the applications. The metric data management methods and systems are composed of separate modules that perform sequential application of metric data reduction techniques on different levels of data abstraction in order to reduce volume of metric data collected. In particular, the modules determine normalcy bounds, delete highly correlated metric data, and delete metric data with highly correlated normalcy bound violations.