Abstract:
Described is an approach that provides an adaptive solution to missing data for machine learning systems. A gradient solution is provided that is attentive to imputation needs at each of several missingness levels. This multilevel approach treats data missingness at any of multiple severity levels while utilizing, as much as possible, the actual observed data.
Abstract:
Described is an approach that provides an adaptive solution to missing data for machine learning systems. A gradient solution is provided that is attentive to imputation needs at each of several missingness levels. This multilevel approach treats data missingness at any of multiple severity levels while utilizing, as much as possible, the actual observed data.
Abstract:
A method, system, and computer program product for generating database cluster health alerts using machine learning. A first database cluster known to be operating normally is measured and modeled using machine learning techniques. A second database cluster is measured and compared to the learned model. More specifically, the method collects a first set of empirically-measured variables of a first database cluster, and using the first set of empirically-measured variables a mathematical behavior predictor model is generated. Then, after collecting a second set of empirically-measured variables of a second database cluster over a plurality of second time periods, the mathematical behavior predictor model classifies the observed behavior. The classified behavior might be deemed to be normal behavior, or some form of abnormal behavior. The method forms and report alerts when the classification deemed to be anomalous behavior, or fault behavior. A Bayesian belief network predicts the likelihood of continued anomalous behavior.
Abstract:
Described is an approach for performing context-aware prognoses in machine learning systems. The approach harnesses streams of detailed data collected from a monitored target to create a context, in parallel to ongoing model operations, for the model outcomes. The context is then probed to identify the particular elements associated with the model findings.
Abstract:
A method, system, and computer program product for analyzing performance of a database cluster. Disclosed are techniques for analyzing performance of components of a database cluster by transforming many discrete event measurements into a time series to identify dominant signals. The method embodiment commences by sampling the database cluster to produce a set of timestamped events, then pre-processing the timestamped events by tagging at least some of the timestamped events with a semantic tag drawn from a semantic dictionary and formatting the set of timestamped events into a time series where a time series entry comprises a time indication and a plurality of values corresponding to signal state values. Further techniques are disclosed for identifying certain signals from the time series to which is applied various statistical measurement criteria in order to isolate a set of candidate signals which are then used to identify indicative causes of database cluster behavior.
Abstract:
Described is an improved approach to remove data outliers by filtering out data correlated to detrimental events within a system. One or more detrimental even conditions are defined to identify and handle abnormal transient states from collected data for a monitored system.
Abstract:
Described is an improved approach to implement selection of training data for machine learning, by presenting a designated set of specific data indicators where these data indicators correspond to metrics that end users are familiar with and are easily understood by ordinary users and DBAs within their knowledge domain. Selection of these indicators would correlate automatically to selection of a corresponding set of other metrics/signals that are less understandable to an ordinary user. Additional analysis of the selected data can then be performed to identify and correct any statistical problems with the selected training data.