Abstract:
The disclosed embodiments provide a system that analyzes telemetry data from a computer system. During operation, the system obtains the telemetry data, which includes first information containing telemetric signals gathered using sensors in the computer system and second information that indicates one or more transaction latencies of software running on the computer system. Upon detecting an upward trend in the one or more transaction latencies, the system analyzes the telemetry data for a correlation between the one or more transaction latencies and one or more environmental factors represented by a subset of the telemetric signals. Upon identifying the correlation between the one or more transaction latencies and an environmental factor, the system stores an indication that the environmental factor may be contributing to the upward trend in the one or more transaction latencies.
Abstract:
The disclosed embodiments provide a system that detects anomalous events in a virtual machine. During operation, the system obtains time-series garbage-collection (GC) data collected during execution of a virtual machine in a computer system. Next, the system generates one or more seasonal features from the time-series GC data. The system then uses a sequential-analysis technique to analyze the time-series GC data and the one or more seasonal features for an anomaly in the GC activity of the virtual machine. Finally, the system stores an indication of a potential out-of-memory (OOM) event for the virtual machine based at least in part on identifying the anomaly in the GC activity of the virtual machine.
Abstract:
The disclosed embodiments provide a system that analyzes telemetry data from a computer system. During operation, the system obtains the telemetry data, which includes first information containing telemetric signals gathered using sensors in the computer system and second information that indicates one or more transaction latencies of software running on the computer system. Upon detecting an upward trend in the one or more transaction latencies, the system analyzes the telemetry data for a correlation between the one or more transaction latencies and one or more environmental factors represented by a subset of the telemetric signals. Upon identifying the correlation between the one or more transaction latencies and an environmental factor, the system stores an indication that the environmental factor may be contributing to the upward trend in the one or more transaction latencies.
Abstract:
The disclosed embodiments provide a system that detects anomalous events. During operation, the system obtains machine-generated time-series performance data collected during execution of a software program in a computer system. Next, the system removes a subset of the machine-generated time-series performance data within an interval around one or more known anomalous events of the software program to generate filtered time-series performance data. The system uses the filtered time-series performance data to build a statistical model of normal behavior in the software program and obtains a number of unique patterns learned by the statistical model. When the number of unique patterns satisfies a complexity threshold, the system applies the statistical model to subsequent machine-generated time-series performance data from the software program to identify an anomaly in an activity of the software program and stores an indication of the anomaly for the software program upon identifying the anomaly.
Abstract:
The disclosed embodiments provide a system that detects anomalous events in a virtual machine. During operation, the system obtains time-series garbage-collection (GC) data collected during execution of a virtual machine in a computer system. Next, the system generates one or more seasonal features from the time-series GC data. The system then uses a sequential-analysis technique to analyze the time-series GC data and the one or more seasonal features for an anomaly in the GC activity of the virtual machine. Finally, the system stores an indication of a potential out-of-memory (OOM) event for the virtual machine based at least in part on identifying the anomaly in the GC activity of the virtual machine.
Abstract:
The disclosed embodiments relate to a system that gathers telemetry data while testing a computer system. During operation, the system obtains a test script that generates a load profile to exercise the computer system, wherein a running time of the test script is designed to be relatively prime in comparison to a sampling interval for telemetry data in the computer system. Next, the system gathers telemetry data during multiple successive executions of the test script on the computer system. The system merges the telemetry data gathered during the multiple successive executions of the test script, wherein the relatively prime relationship between the running time of the test script and the sampling interval for the telemetry data causes a sampling point for the telemetry data to precess through different points in the test script during the multiple successive executions of the test script, thereby densifying sampled telemetry data points gathered for the test script. Finally, the system outputs the densified telemetry data.
Abstract:
The disclosed embodiments provide a system that detects anomalous events in a virtual machine. During operation, the system obtains time-series virtual machine (VM) data including garbage-collection (GC) data collected during execution of a virtual machine in a computer system. Next, the system computes, by a service processor, a time window for analyzing the time-series VM data based at least in part on a working time scale of high-activity patterns in the time-series GC data. The system then uses a trend-estimation technique to analyze the time-series VM data within the time window to determine an out-of-memory (OOM) risk in the virtual machine. Finally, the system stores an indication of the OOM risk for the virtual machine based at least in part on determining the OOM risk in the virtual machine.
Abstract:
The disclosed embodiments relate to a system that gathers telemetry data while testing a computer system. During operation, the system obtains a test script that generates a load profile to exercise the computer system, wherein a running time of the test script is designed to be relatively prime in comparison to a sampling interval for telemetry data in the computer system. Next, the system gathers telemetry data during multiple successive executions of the test script on the computer system. The system merges the telemetry data gathered during the multiple successive executions of the test script, wherein the relatively prime relationship between the running time of the test script and the sampling interval for the telemetry data causes a sampling point for the telemetry data to precess through different points in the test script during the multiple successive executions of the test script, thereby densifying sampled telemetry data points gathered for the test script. Finally, the system outputs the densified telemetry data.
Abstract:
The disclosed embodiments provide a system that detects anomalous events. During operation, the system obtains machine-generated time-series performance data collected during execution of a software program in a computer system. Next, the system removes a subset of the machine-generated time-series performance data within an interval around one or more known anomalous events of the software program to generate filtered time-series performance data. The system uses the filtered time-series performance data to build a statistical model of normal behavior in the software program and obtains a number of unique patterns learned by the statistical model. When the number of unique patterns satisfies a complexity threshold, the system applies the statistical model to subsequent machine-generated time-series performance data from the software program to identify an anomaly in an activity of the software program and stores an indication of the anomaly for the software program upon identifying the anomaly.
Abstract:
The disclosed embodiments provide a system that detects anomalous events in a virtual machine. During operation, the system obtains time-series virtual machine (VM) data including garbage-collection (GC) data collected during execution of a virtual machine in a computer system. Next, the system computes, by a service processor, a time window for analyzing the time-series VM data based at least in part on a working time scale of high-activity patterns in the time-series GC data. The system then uses a trend-estimation technique to analyze the time-series VM data within the time window to determine an out-of-memory (OOM) risk in the virtual machine. Finally, the system stores an indication of the OOM risk for the virtual machine based at least in part on determining the OOM risk in the virtual machine.