Abstract:
Anomalies and drift detection in decentralized learning environments. The method includes deploying at a first node, (1) a local unsupervised autoencoder, trained at the first node, along with a local training data reference baseline for the first node, and (2) a global unsupervised autoencoder trained across a plurality of nodes, along with a corresponding global training data reference baseline. Production data at the first node is processed with local and global ML models deployed by a user. At least one of local and global anomaly data regarding anomalous production data or local and global drift data regarding drifting production data is derived based on the local and global training data reference baselines, respectively. At least one of the local anomaly data is compared with the global anomaly data or the local drift data with the global drift data for assessing impact of anomalies/drift on the ML models.
Abstract:
The disclosure relates to technology that implements flow control for machine learning on data such as Internet of Things (“IoT”) datasets. The system may route outputs of a data splitter function performed on the IoT datasets to a designated target model based on a user specification for routing the outputs. In this manner, the IoT datasets may be dynamically routed to target datasets without reprogramming machine-learning pipelines, which enable rapid training, testing and validation of ML models as well as an ability to concurrently train, validate, and execute ML models.
Abstract:
The present subject matter relates to perform proactive monitoring and diagnostics in storage area networks (SANs). In one implementation, the method comprises depicting topology of the SAN in a graph, wherein the graph designates the devices as nodes, the connecting elements as edges, and depicts operations associated with at least one component of the nodes and edges. The method further comprises monitoring at least one parameter indicative of performance of the component to ascertain degradation of the at least one component and identifying, a hinge in the data associated with the monitoring, wherein the hinge is indicative of an initiation in degradation of the component. Based on the hinge, proactive diagnostics is preformed to compute a remaining lifetime of the at least one component. Thereafter, a notification is generated for an administrator of the SAN based on the remaining lifetime.
Abstract:
Systems and methods are provided for implementing a Siamese neural network using improved “sub” neural networks and loss function. For example, the system can detect a granular change in images using a Siamese Neural Network with Convolutional Autoencoders as the twin sub networks (e.g., Siamese AutoEncoder or “SAE”). In some examples, the loss function may be an adaptive loss function to the SAE network rather than a contrastive loss function, which can help enable smooth control of granularity of change detection across the images. In some examples, an image separation distance value may be calculated to determine the value of change between the image pairs. The image separation distance value may be determined using an Euclidean distance associated with a latent space of an encoder portion of the autoencoder of the neural networks.
Abstract:
Systems and methods are provided for retraining machine learning (ML) models. Examples may automatically identify skewed, anomalous, and/or drift occurrence data in real-world input data. By automatically identifying such data, examples can reduce subjectivity in ML model retraining as well as reduce time spent determining a need to retrain a ML model. Accordingly, a determination can be made objectively by a computing system or device according to computer-implemented instructions. Additionally, examples may automatically isolate and transfer data relevant to the retraining of a ML model to a training environment for retraining the ML model using real-world input data. Examples also synthesize large samples of data for use in retraining a ML model. The synthesized data may be generated based on the isolated and transferred data and can be used in place of actual real-world input data to reduce a corresponding delay.
Abstract:
Systems and methods are provided for utilization of optimal data access interface usage in machine learning pipelines. Examples of the systems and methods disclosed herein include identifying data access interfaces comprising at least a first data access interface for a persistent storage distributed across a plurality of storage nodes and at least a second data access interface for an in-memory object store, and receiving, from a compute node, a data operation request as part of a machine learning pipeline. Additionally, performance metrics are obtained for the plurality of access interfaces, and based on a type of data operation request, the data operation is executed using a data access interface selected from the plurality of data access interface based on the performance metrics and providing an object handle to the compute node.
Abstract:
Systems and methods for preventing prediction performance degradation by detecting and extracting skews in data during both training and production environments is described herein. Feature extraction may be performed on training data during the training phase, followed by pattern analysis that assesses similarities across labeled training data sets. A reference pattern may be derived from the pattern analysis and feature extraction of the training data. Feature extraction and pattern analysis may be performed on production data during the serving phase, and a target pattern may be derived from the pattern analysis and feature extraction of the production data. The reference pattern and target pattern may be fed to a discrepancy detection functionality to detect discrepancies by using a sliding window to move the target pattern across the reference pattern to make comparisons between the patterns. The comparison may provide a quantitative skew across the training and production data.
Abstract:
Systems and methods provide for a federated workflow solution to orchestrate entire machine learning (ML) workflows comprising multiple tasks, across silos. In other words, one or more sets/pluralities of tasks making up an ML workflow, can be executed across multiple resource partitions or domains. Federated workflow state can be maintained and shared through some form of distributed database/ledger, such as a blockchain. Agents that are locally deployed locally at the silos may orchestrate an ML workflow at a particular resource domains, each such agent having access, via the blockchain (acting as a globally visible/consistent state store), to the aforementioned workflow state. Such systems are capable of operating regardless of the existence of heterogeneous resources/aspects of a silo.