摘要:
One embodiment provides a system that analyzes telemetry data from a computer system. During operation, the system periodically obtains the telemetry data from the computer system. Next, the system preprocesses the telemetry data using a sequential-analysis technique. If a statistical deviation is found in the telemetry data using the sequential-analysis technique, the system identifies a subset of the telemetry data associated with the statistical deviation and applies a root-cause-analysis technique to the subset of the telemetry data to determine a source of the statistical deviation. Finally, the system uses the source of the statistical deviation to perform a remedial action for the computer system, which involves correcting a fault in the computer system corresponding to the source of the statistical deviation.
摘要:
One embodiment provides a system that analyzes telemetry data from a computer system. During operation, the system periodically obtains the telemetry data from the computer system. Next, the system preprocesses the telemetry data using a sequential-analysis technique. If a statistical deviation is found in the telemetry data using the sequential-analysis technique, the system identifies a subset of the telemetry data associated with the statistical deviation and applies a root-cause-analysis technique to the subset of the telemetry data to determine a source of the statistical deviation. Finally, the system uses the source of the statistical deviation to perform a remedial action for the computer system, which involves correcting a fault in the computer system corresponding to the source of the statistical deviation.
摘要:
One embodiment of the present invention provides a system that trains a pattern-recognition model for electronic prognostication for a computer system. First, the system monitors a performance parameter from a set of computer systems that includes at least two computer systems, wherein monitoring the performance parameter includes systematically monitoring and recording performance parameters in a set of performance parameters from computer systems in the set of computer systems, wherein the recording process keeps track of the temporal relationships between events in different performance parameters in the set of performance parameters. Next, the system generates a training data set based on the monitored performance parameter from the set of computer systems, wherein generating the training data set includes concatenating two or more time-series of the performance parameter from computer systems in the set of computer systems. Then, the system trains the pattern-recognition model using the training data set. Next, the system uses the pattern-recognition model to look for anomalies in performance parameters gathered during operation of a monitored computer system. The system then generates an alarm when the pattern-recognition model detects an anomaly in the performance parameters from the monitored computer system.
摘要:
One embodiment of the present invention provides a system that trains a pattern-recognition model for electronic prognostication for a computer system. First, the system monitors a performance parameter from a set of computer systems that includes at least two computer systems, wherein monitoring the performance parameter includes systematically monitoring and recording performance parameters in a set of performance parameters from computer systems in the set of computer systems, wherein the recording process keeps track of the temporal relationships between events in different performance parameters in the set of performance parameters. Next, the system generates a training data set based on the monitored performance parameter from the set of computer systems, wherein generating the training data set includes concatenating two or more time-series of the performance parameter from computer systems in the set of computer systems. Then, the system trains the pattern-recognition model using the training data set. Next, the system uses the pattern-recognition model to look for anomalies in performance parameters gathered during operation of a monitored computer system. The system then generates an alarm when the pattern-recognition model detects an anomaly in the performance parameters from the monitored computer system.