Abstract:
Techniques for geometric aging data reduction for machine learning applications are disclosed. In some embodiments, an artificial-intelligence powered system receives a first time-series dataset that tracks at least one metric value over time. The system then generates a second time-series dataset that includes a reduced version of a first portion of the time-series dataset and a non-reduced version of a second portion of the time-series dataset. The second portion of the time-series dataset may include metric values that are more recent than the first portion of the time-series dataset. The system further trains a machine learning model using the second time-series dataset that includes the reduced version of the first portion of the time-series dataset and the non-reduced version of the second portion of the time-series dataset. The trained model may be applied to reduced and/or non-reduced data to detect multivariate anomalies and/or provide other analytic insights.
Abstract:
In one embodiment, a method for auditing the results of a machine learning model includes: retrieving a set of state estimates for original time series data values from a database under audit; reversing the state estimation computation for each of the state estimates to produce reconstituted time series data values for each of the state estimates; retrieving the original time series data values from the database under audit; comparing the original time series data values pairwise with the reconstituted time series data values to determine whether the original time series and reconstituted time series match; and generating a signal that the database under audit (i) has not been modified where the original time series and reconstituted time series match, and (ii) has been modified where the original time series and reconstituted time series do not match.
Abstract:
Techniques for geometric aging data reduction for machine learning applications are disclosed. In some embodiments, an artificial-intelligence powered system receives a first time-series dataset that tracks at least one metric value over time. The system then generates a second time-series dataset that includes a reduced version of a first portion of the time-series dataset and a non-reduced version of a second portion of the time-series dataset. The second portion of the time-series dataset may include metric values that are more recent than the first portion of the time-series dataset. The system further trains a machine learning model using the second time-series dataset that includes the reduced version of the first portion of the time-series dataset and the non-reduced version of the second portion of the time-series dataset. The trained model may be applied to reduced and/or non-reduced data to detect multivariate anomalies and/or provide other analytic insights.
Abstract:
Embodiments of the invention provide systems and methods for managing and processing large amounts of complex and high-velocity data by capturing and extracting high-value data from low value data using big data and related technologies. Illustrative database systems described herein may collect and process data while extracting or generating high-value data. The high-value data may be handled by databases providing functions such as multi-temporality, provenance, flashback, and registered queries. In some examples, computing models and system may be implemented to combine knowledge and process management aspects with the near real-time data processing frameworks in a data-driven situation aware computing system.
Abstract:
The disclosed embodiments provide a system that proactively resilvers a disk array when a disk drive in the array is determined to have an elevated risk of failure. The system receives time-series signals associated with the disk array during operation of the disk array. Next, the system analyzes the time-series signals to identify at-risk disk drives that have an elevated risk of failure. If one or more disk drives are identified as being at-risk, the system performs a proactive resilvering operation on the disk array using a background process while the disk array continues to operate using the at-risk disk drives.
Abstract:
The disclosed embodiments relate to a system that certifies provenance of time-series data in a time-series database. During operation, the system retrieves time-series data from the time-series database, wherein the time-series data comprises a sequence of observations comprising sensor readings for each signal in a set of signals. The system also retrieves multivariate state estimation technique (MSET) estimates, which were computed for the time-series data, from the time-series database. Next, the system performs a reverse MSET computation to produce reconstituted time-series data from the MSET estimates. The system then compares the reconstituted time-series data with the time-series data. If the reconstituted time-series data matches the original time-series data, the system certifies provenance for the time-series data.
Abstract:
Data can be categorized into facts, information, hypothesis, and directives. Activities that generate certain categories of data based on other categories of data through the application of knowledge which can be categorized into classifications, assessments, resolutions, and enactments. Activities can be driven by a Classification-Assessment-Resolution-Enactment (CARE) control engine. The CARE control and these categorizations can be used to enhance a multitude of systems, for example diagnostic system, such as through historical record keeping, machine learning, and automation. Such a diagnostic system can include a system that forecasts computing system failures based on the application of knowledge to system vital signs such as thread or stack segment intensity and memory heap usage. These vital signs are facts that can be classified to produce information such as memory leaks, convoy effects, or other problems. Classification can involve the automatic generation of classes, states, observations, predictions, norms, objectives, and the processing of sample intervals having irregular durations.
Abstract:
Embodiments of the invention provide systems and methods for managing and processing large amounts of complex and high-velocity data by capturing and extracting high-value data from low value data using big data and related technologies. Illustrative database systems described herein may collect and process data while extracting or generating high-value data. The high-value data may be handled by databases providing functions such as multi-temporality, provenance, flashback, and registered queries. In some examples, computing models and system may be implemented to combine knowledge and process management aspects with the near real-time data processing frameworks in a data-driven situation aware computing system.
Abstract:
The disclosed embodiments relate to a system that automatically adapts a prognostic-surveillance system to account for aging phenomena in a monitored system. During operation, the prognostic-surveillance system is operated in a surveillance mode, wherein a trained inferential model is used to analyze time-series signals from the monitored system to detect incipient anomalies. During the surveillance mode, the system periodically calculates a reward/cost metric associated with updating the trained inferential model. When the reward/cost metric exceeds a threshold, the system swaps the trained inferential model with an updated inferential model, which is trained to account for aging phenomena in the monitored system.
Abstract:
First, the system obtains time-series sensor data. Next, the system identifies missing values in the time-series sensor data, and fills in the missing values through interpolation. The system then divides the time-series sensor data into a training set and an estimation set. Next, the system trains an inferential model on the training set, and uses the inferential model to replace interpolated values in the estimation set with inferential estimates. If there exist interpolated values in the training set, the system switches the training and estimation sets. The system trains a new inferential model on the new training set, and uses the new inferential model to replace interpolated values in the new estimation set with inferential estimates. The system then switches back the training and estimation sets. Finally, the system combines the training and estimation sets to produce preprocessed time-series sensor data, wherein missing values are filled in with imputed values.