Abstract:
Systems and methods are disclosed for detecting error in a cloud infrastructure by running a plurality of training tasks on the cloud infrastructure and generating training execution logs; generating a model miner with the training execution logs to represent one or more correct task executions in the cloud infrastructure; after training, running a plurality of tasks on the cloud infrastructure and capturing live execution logs; and from the live execution logs, if a current task deviates from the correct task execution, indicating an execution error for correction in real-time.
Abstract:
Methods and systems for system maintenance include identifying patterns in heterogeneous logs. Predictive features are extracted from a set of input logs based on the identified patterns. It is determined that the predictive features indicate a future system failure using a first model. A second model is trained, based on a target sample from the predictive features and based on weights associated with a distance between the target sample and a set of samples from the predictive features, to identify one or more parameters of the second model associated with the future system failure. A system maintenance action is performed in accordance with the identified one or more parameters.
Abstract:
A computer-implemented method for automatically analyzing log contents received via a network and detecting content-level anomalies is presented. The computer-implemented method includes building a statistical model based on contents of a set of training logs and detecting, based on the set of training logs, content-level anomalies for a set of testing logs. The method further includes maintaining an index and metadata, generating attributes for fields, editing model capability to incorporate user domain knowledge, detecting anomalies using field attributes, and improving anomaly quality by using user feedback.
Abstract:
Systems and methods for system event searching based on heterogeneous logs are provided. A system can include a processor device operatively coupled to a memory device wherein the processor device is configured to mine a variety of log patterns from various of heterogeneous logs to obtain known-event log patterns and unknown-event log patterns, as well as to build a weighted vector representation of the log patterns. The processor device is also configured to evaluate a similarity between the vector representation of the unknown-event and known-event log patterns, identify a known event that is most similar to an unknown event to troubleshoot system faults based on past actions for similar events to improve an operation of a computer system.
Abstract:
Systems and methods are disclosed for detecting periodic event behaviors from machine generated logging by: capturing heterogeneous log messages, each log message including a time stamp and text content with one or more fields; recognizing log formats from log messages; transforming the text content into a set of time series data, one time series for each log format; during a training phase, analyzing the set of time series data and building a category model for each periodic event type in heterogeneous logs; and during live operation, applying the category model to a stream of time series data from live heterogeneous log messages and generating a flag on a time series data point violating the category model and generating an alarm report for the corresponding log message.
Abstract:
Methods and systems for system failure diagnosis and correction include extracting syntactic patterns from a plurality of logs with heterogeneous formats. The syntactic patterns are clustered according to categories of system failure. A single semantically unique pattern is extracted for each category of system failure. The semantically unique patterns are matched to recent log information to detect a corresponding system failure. A corrective action us performed responsive to the detected system failure.
Abstract:
Systems and methods for system event searching based on heterogeneous logs are provided. A system can include a processor device operatively coupled to a memory device wherein the processor device is configured to mine a variety of log patterns from various of heterogeneous logs to obtain known-event log patterns and unknown-event log patterns, as well as to build a weighted vector representation of the log patterns. The processor device is also configured to evaluate a similarity between the vector representation of the unknown-event and known-event log patterns, identify a known event that is most similar to an unknown event to troubleshoot system faults based on past actions for similar events to improve an operation of a computer system.
Abstract:
A method and system are provided. The method includes performing, by a logs-to-time-series converter, a logs-to-time-series conversion by transforming a plurality of heterogeneous logs into a set of time series. Each of the heterogeneous logs includes a time stamp and text portion with one or more fields. The method further includes performing, by a time-series-to-sequential-pattern converter, a time-series-to-sequential-pattern conversion by mining invariant relationships between the set of time series, and discovering sequential message patterns and association rules in the plurality of heterogeneous logs using the invariant relationships. The method also includes executing, by a processor, a set of log management applications, based on the sequential message patterns and the association rules.
Abstract:
Methods for system failure prediction include clustering log files according to structural log patterns. Feature representations of the log files are determined based on the log clusters. A likelihood of a system failure is determined based on the feature representations using a neural network. An automatic system control action is performed if the likelihood of system failure exceeds a threshold.
Abstract:
Methods and systems for system maintenance include identifying patterns in heterogeneous logs. Predictive features are extracted from a set of input logs based on the identified patterns. It is determined that the predictive features indicate a future system failure using a first model. A second model is trained, based on a target sample from the predictive features and based on weights associated with a distance between the target sample and a set of samples from the predictive features, to identify one or more parameters of the second model associated with the future system failure. A system maintenance action is performed in accordance with the identified one or more parameters.