Abstract:
Various examples are disclosed for forecasting resource usage and computing capacity utilizing an exponential decay. In some examples, a computing environment can obtain usage measurements from a data stream over a time interval, where the usage measurements describe utilization of computing resource. The computing environment can generate a weight function for individual ones of the usage measurements, where the weight function exponentially decays the usage measurements based on a respective time period at which the usage measurements were obtained. The computing environment can forecast a future capacity of the computing resources based on the usage measurements and the weight function assigned to the individual ones of the usage measurements. The computing environment can further upgrade a forecast engine to use the exponential decay without resetting the forecast engine or its memory.
Abstract:
This disclosure is directed to tagging tokens or sequences of tokens in log messages generated by a logging source. Event types of log messages in a block of log messages are collected. A series of tagging operations are applied to each log message in the block. For each tagging operation, event types that are qualified to receive the corresponding tag are identified. When a log message is received, the event type is determined and compared with the event types of the block in order to identify a matching event type. The series of tagging operations are applied to the log message to generate a tagged log message with the restriction that each tagging operation only applies a tag to token or sequences of tokens when the event type is qualified to receive the tag. The tagged log message is stored in a data-storage device.
Abstract:
The current document is directed to systems, and methods incorporated within the systems, that execute queries against log-file entries. A monitoring subsystem within a distributed computer system uses query results during analysis of log-file entries in order to detect changes in the state of the distributed computer system, identify problems or potential problems, and predict and forecast system characteristics. Because of the large numbers of log-file-entry containers that may need to be opened and processed in order to execute a single query, and because opening and reading through the entries in a log-file-entry container is a computationally expensive and time-consuming operation, the currently disclosed systems employ event-type metadata associated with log-file-entry containers to avoid opening and reading through the log-file entries of log-file-entry containers that do not contain log-file entries with event types relevant to the query.
Abstract:
Methods and systems that detect computer system anomalies based on log file sampling are described. Computers systems generate log files that record various types of operating system and software run events in event messages. For each computer system, a sample of event messages are collected in a first time interval and a sample of event messages are collected in a recent second time interval. Methods calculate a difference between the event messages collected in the first and second time intervals. When the difference is greater than a threshold, an alert is generated. The process of repeatedly collecting a sample of event messages in a recent time interval, calculating a difference between the event messages collected in the recent and previous time intervals, comparing the difference to the threshold, and generating an alert when the threshold is violated may be executed for each computer system of a cluster of computer systems.
Abstract:
The present disclosure is related to systems, methods, and non-transitory machine readable media for alerting with duplicate suppression. An example non-transitory machine readable medium can store instructions executable by a processing resource to cause a computing system to receive an alert at a first virtual computing instance (VCI) from a second VCI, compare the alert with at least one previously received alert to determine if the alert is a duplicate alert, and send the alert to an alert notification queue associated with the first VCI in response to a determination that the alert is not a duplicate alert. In some embodiments, the medium can store instructions to confirm that the alert has been sent in response to the determination that the alert is a duplicate alert.
Abstract:
The current document is directed to methods and systems that process, classify, efficiently store, and display large volumes of event messages generated in modern computing systems. In a disclosed implementation, received event messages are assigned to event-message clusters based on non-parameter tokens identified within the event messages. A parsing function is generated for each cluster that is used to extract data from incoming event messages and to prepare event records from event messages that more efficiently and accessible store event information. The parsing functions also provide an alternative basis for assignment of event messages to clusters. Event types associated with the clusters are used for gathering information from various information sources with which to automatically annotate event messages displayed to system administrators, maintenance personnel, and other users of event messages.
Abstract:
Examples herein include systems and methods for providing capacity forecasting for high-usage periods of a computing infrastructure. An example method can include segmenting a first portion of a data stream and generating a first core set for a forecasting model that predicts future usage of computing resources. The example method can further include segmenting a second portion of the data stream, generating a second core set, and using both core sets to forecast usage. The first core set can then be phased out after a predetermined time period has elapsed such that forecasting is based only on the second core set. The example method can further include defining at least two clusters of data and performing predictive analysis on that specific cluster. Cluster-specific results can be displayed on a GUI, which can also provide a user with options for increase or decrease computing resources based on the predictions.
Abstract:
Automated methods and systems described herein are directed to identifying potential root causes of a problem in a data center. Methods and systems receipt an alert or other notification of a problem occurring in a data center and a time when the problem was noticed. A search window is created based on the time and a stream of log messages generated in the search window is converted into a time dependent metric. An anomaly detection technique is applied to the metric to determine a start time of a problem. Logging events and key phrases in the log messages are identified in the search window and presented as potential root causes of the problem. The potential root cause may then be used by system administrators and/or tenants to diagnose the problem and execute remedial measures to correct the problem.
Abstract:
Computational methods and systems described herein manage alerts generated by event sources that run in a distributed computing system. Methods and system provide a graphical user interface that enables a user to define a dominant alert and select subsumed alerts generated by the event sources. Methods and systems may also compute a relative fraction that represents a number of times each alert is triggered with respect to a number of times another alert is triggered for each pair of alerts. The relative fractions may be displayed in the graphical user interface to allow a user to select dominant and subsumed alerts based on the relative fractions. Methods and systems identify log messages that correspond to user-identified subsumed alerts, suppress subsumed alerts and generate the dominant alert. Methods and systems may also execute remedial action to correct the problem represented by the dominant alert.
Abstract:
Various examples are disclosed for forecasting resource usage and computing capacity utilizing an exponential decay. In some examples, a computing environment can obtain usage measurements from a data stream over a time interval, where the usage measurements describe utilization of computing resource. The computing environment can generate a weight function for individual ones of the usage measurements, where the weight function exponentially decays the usage measurements based on a respective time period at which the usage measurements were obtained. The computing environment can forecast a future capacity of the computing resources based on the usage measurements and the weight function assigned to the individual ones of the usage measurements. The computing environment can further upgrade a forecast engine to use the exponential decay without resetting the forecast engine or its memory.