Abstract:
This disclosure is directed to tagging tokens or sequences of tokens in log messages generated by a logging source. Event types of log messages in a block of log messages are collected. A series of tagging operations are applied to each log message in the block. For each tagging operation, event types that are qualified to receive the corresponding tag are identified. When a log message is received, the event type is determined and compared with the event types of the block in order to identify a matching event type. The series of tagging operations are applied to the log message to generate a tagged log message with the restriction that each tagging operation only applies a tag to token or sequences of tokens when the event type is qualified to receive the tag. The tagged log message is stored in a data-storage device.
Abstract:
The current document is directed to systems, and methods incorporated within the systems, that execute queries against log-file entries. A monitoring subsystem within a distributed computer system uses query results during analysis of log-file entries in order to detect changes in the state of the distributed computer system, identify problems or potential problems, and predict and forecast system characteristics. Because of the large numbers of log-file-entry containers that may need to be opened and processed in order to execute a single query, and because opening and reading through the entries in a log-file-entry container is a computationally expensive and time-consuming operation, the currently disclosed systems employ event-type metadata associated with log-file-entry containers to avoid opening and reading through the log-file entries of log-file-entry containers that do not contain log-file entries with event types relevant to the query.
Abstract:
The current document is directed to methods and systems for processing, classifying, and efficiently storing large volumes of event messages generated in modern computing systems. In a disclosed implementation, received event messages are assigned to event-message clusters based on non-parameter tokens identified within the event messages. A parsing function is generated for each cluster that is used to extract data from incoming event messages and to prepare event records from event messages that more efficiently and accessible store event information. The parsing functions also provide an alternative basis for assignment of event massages to clusters.
Abstract:
Methods and systems that detect computer system anomalies based on log file sampling are described. Computers systems generate log files that record various types of operating system and software run events in event messages. For each computer system, a sample of event messages are collected in a first time interval and a sample of event messages are collected in a recent second time interval. Methods calculate a difference between the event messages collected in the first and second time intervals. When the difference is greater than a threshold, an alert is generated. The process of repeatedly collecting a sample of event messages in a recent time interval, calculating a difference between the event messages collected in the recent and previous time intervals, comparing the difference to the threshold, and generating an alert when the threshold is violated may be executed for each computer system of a cluster of computer systems.
Abstract:
The current document is directed to methods and systems that process, classify, efficiently store, and display large volumes of event messages generated in modern computing systems. In a disclosed implementation, event messages are assigned types and transformed into event records with well-defined fields that contain field values. Recurring patterns of event messages, referred to as “transactions,” are identified within streams or sequences of time-associated event messages and streams or sequences of time-associated event records.
Abstract:
The current document is directed to methods and systems that process, classify, efficiently store, and display large volumes of event messages generated in modern computing systems. In a disclosed implementation, received event messages are assigned to event-message clusters based on non-parameter tokens identified within the event messages. A parsing function is generated for each cluster that is used to extract data from incoming event messages and to prepare event records from event messages that more efficiently and accessible store event information. The parsing functions also provide an alternative basis for assignment of event messages to clusters. Event types associated with the clusters are used for gathering information from various information sources with which to automatically annotate event messages displayed to system administrators, maintenance personnel, and other users of event messages.
Abstract:
Computational methods and systems for detecting and troubleshooting anomalous behavior in distributed applications executing in a distributed computing system are described herein. Methods and systems discover nodes comprising the application. Anomaly detection monitors the metrics associated with the nodes for anomalous behavior in order to identify an approximate point in time when anomalous behavior begins to adversely impact performance of the application. Anomaly detection also monitors logs messages associated with the nodes to detect anomalous behavior recorded in the log messages. When anomalous behavior is detected in either the metrics and/or the log messages an alert identifying the anomalous behavior is generated. Troubleshooting guides an administrator and/or application owner to investigate the root cause of the anomalous behavior. Appropriate remedial measures may be determined based on the root cause and automatically or manually executed to correct the problem.
Abstract:
Methods and systems are directed to discovering problem incidents in a distributed computing system. Events corresponding to historical problems incidents for the distributed computing system are retrieved from a data base. Sets of representative events of the various historical problem incidents for the distributed computing system are determined. A runtime problem incident in the distributed computing system is characterized by runtime events. The runtime problem incident is classified as corresponding to a historical problem incident of the historical problem incidents based on the runtime events and the sets of representative events. Remedial measures used to correct the historical problem incident may be used to correct the runtime problem.
Abstract:
The current document is directed to systems, and methods incorporated within the systems, that carry out probability-distribution-based analysis of log-file entries. A monitoring subsystem within a distributed computer system uses probability-distribution-based analysis of log-file entries to detect changes in the state of the distributed computer system. A log-file-analysis subsystem within a distributed computer system uses probability-distribution-based analysis of log-file entries to identify subsets of log-file entries that predict anomalies and impending problems in the distributed computer system. In many implementations, a numerical comparison of probability distributions of log-file-entry types is used to detect state changes in the distributed computer system.
Abstract:
The current document is directed to automated methods and systems that employ unsupervised-machine-learning approaches as well as rule-based systems to discover distributed applications within distributed-computing environments. These automated methods and systems provide a basis for higher-level distributed-application administration and management tools and subsystems that provide distributed-application-level user interfaces and operations. In one implementation, the currently disclosed methods and systems employ agents within virtual machines that execute routines and programs and that together comprise a distributed application to continuously furnish information about the virtual machines to a pipeline of stream processors that collect and filter the information to provide for periodic application-discovery. The stream processors generate data representations of the processes currently running on the virtual machines and data representations of the communications connections between the virtual machines. An application-discovery subsystem periodically employs these data representations, and additional data derived from them, to identify the different distributed applications running within a distributed-computing facility and to identify tiers of virtual-machine nodes within each identified distributed application. This, in turn, allows the application-discovery subsystem to generate sets of delta changes for the discovered applications after each periodic application discovery.