Abstract:
Computational methods and systems for detecting and troubleshooting anomalous behavior in distributed applications executing in a distributed computing system are described herein. Methods and systems discover nodes comprising the application. Anomaly detection monitors the metrics associated with the nodes for anomalous behavior in order to identify an approximate point in time when anomalous behavior begins to adversely impact performance of the application. Anomaly detection also monitors logs messages associated with the nodes to detect anomalous behavior recorded in the log messages. When anomalous behavior is detected in either the metrics and/or the log messages an alert identifying the anomalous behavior is generated. Troubleshooting guides an administrator and/or application owner to investigate the root cause of the anomalous behavior. Appropriate remedial measures may be determined based on the root cause and automatically or manually executed to correct the problem.
Abstract:
The current document is directed to systems, and methods incorporated within the systems, that carry out probability-distribution-based analysis of log-file entries. A monitoring subsystem within a distributed computer system uses probability-distribution-based analysis of log-file entries to detect changes in the state of the distributed computer system. A log-file-analysis subsystem within a distributed computer system uses probability-distribution-based analysis of log-file entries to identify subsets of log-file entries that predict anomalies and impending problems in the distributed computer system. In many implementations, a numerical comparison of probability distributions of log-file-entry types is used to detect state changes in the distributed computer system.
Abstract:
The current document is directed to systems, and methods incorporated within the systems, that carry out probability-distribution-based analysis of log-file entries. A monitoring subsystem within a distributed computer system uses probability-distribution-based analysis of log-file entries to detect changes in the state of the distributed computer system. A log-file-analysis subsystem within a distributed computer system uses probability-distribution-based analysis of log-file entries to identify subsets of log-file entries that predict anomalies and impending problems in the distributed computer system. In many implementations, a numerical comparison of probability distributions of log-file-entry types is used to detect state changes in the distributed computer system.
Abstract:
The current document is directed to methods and systems for processing, classifying, and efficiently storing large volumes of event messages generated in modern computing systems. In a disclosed implementation, received event messages are normalized to identify non-parameter tokens within the event messages. The non-parameter event tokens are used to compute a metric for each event message. The metrics are used, in turn, to identify a type-associated cluster to which to assign each received event message. The type-associated clusters are created dynamically as streams of event messages are processed. The type-associated clusters may be dynamically split and merged to refine event-message typing.
Abstract:
Various examples are disclosed for forecasting resource usage and computing capacity utilizing an exponential decay. In some examples, a computing environment can obtain usage measurements from a data stream over a time interval, where the usage measurements describe utilization of computing resource. The computing environment can generate a weight function for individual ones of the usage measurements, where the weight function exponentially decays the usage measurements based on a respective time period at which the usage measurements were obtained. The computing environment can forecast a future capacity of the computing resources based on the usage measurements and the weight function assigned to the individual ones of the usage measurements. The computing environment can further upgrade a forecast engine to use the exponential decay without resetting the forecast engine or its memory.
Abstract:
The current document is directed to systems, and methods incorporated within the systems, that execute queries against log-file entries. A monitoring subsystem within a distributed computer system uses query results during analysis of log-file entries in order to detect changes in the state of the distributed computer system, identify problems or potential problems, and predict and forecast system characteristics. Because of the large numbers of log-file-entry containers that may need to be opened and processed in order to execute a single query, and because opening and reading through the entries in a log-file-entry container is a computationally expensive and time-consuming operation, the currently disclosed systems employ event-type metadata associated with log-file-entry containers to avoid opening and reading through the log-file entries of log-file-entry containers that do not contain log-file entries with event types relevant to the query.
Abstract:
The current document is directed to methods and systems for processing, classifying, and efficiently storing large volumes of event messages generated in modern computing systems. In a disclosed implementation, received event messages are normalized to identify non-parameter tokens within the event messages. The non-parameter event tokens are used to compute a metric for each event message. The metrics are used, in turn, to identify a type-associated cluster to which to assign each received event message. The type-associated clusters are created dynamically as streams of event messages are processed. The type-associated clusters may be dynamically split and merged to refine event-message typing.
Abstract:
Methods and systems to identify log write instructions of a source code as potential sources of an event message of interest are described. Methods identify non-parametric tokens, such as text strings and natural language words and phrases, of an event message of interest. Candidate log write instructions and associated line numbers in a source code are identified. Non-parametric tokens of each event message of the one or more candidate log write instructions are determined. A confidence score is calculated for each candidate log write instruction based the number of non-parametric tokens the event message of interest and event message of the candidate log write instruction have in common. The candidate log write instructions are rank ordered based on the corresponding one or more confidence scores and the rank ordered candidate log write instructions and associated line numbers of the source code may be displayed in a graphical user interface.
Abstract:
The current document is directed to methods and systems for processing, classifying, and efficiently storing large volumes of event messages generated in modern computing systems. In a disclosed implementation, received event messages are assigned to clusters based on metrics computed for the event messages. In addition, a significance value is determined for each received event message. When the significance value exceeds a threshold value, one or more actions are taken, including marking an event record corresponding to the event message, storing an event record corresponding to the event message in a significant-event log, and generating a notice or alarm.
Abstract:
The current document is directed to systems, and methods incorporated within the systems, that carry out probability-distribution-based analysis of log-file entries. A monitoring subsystem within a distributed computer system uses probability-distribution-based analysis of log-file entries to detect changes in the state of the distributed computer system. A log-file-analysis subsystem within a distributed computer system uses probability-distribution-based analysis of log-file entries to identify subsets of log-file entries that predict anomalies and impending problems in the distributed computer system. In many implementations, a numerical comparison of probability distributions of log-file-entry types is used to detect state changes in the distributed computer system.