Unsupervised anomaly detection by self-prediction

    公开(公告)号:US11928857B2

    公开(公告)日:2024-03-12

    申请号:US16924048

    申请日:2020-07-08

    Applicant: VMware, Inc.

    Abstract: Techniques for implementing unsupervised anomaly detection by self-prediction are provided. In one set of embodiments, a computer system can receive an unlabeled training data set comprising a plurality of unlabeled data instances, where each unlabeled data instance includes values for a plurality of features. The computer system can further train, for each feature in the plurality of features, a supervised machine learning (ML) model using a labeled training data set derived from the unlabeled training data set, receive a query data instance, and generate a self-prediction vector using at least a portion of the trained supervised ML models and the query data instance, where the self-prediction vector indicates what the query data instance should look like if it were normal. The computer system can then generate an anomaly score for the query data instance based on the self-prediction vector and the query data instance.

    ACCELERATING DATA MESSAGE CLASSIFICATION WITH SMART NICS

    公开(公告)号:US20230409484A1

    公开(公告)日:2023-12-21

    申请号:US17845661

    申请日:2022-06-21

    Applicant: VMware, Inc.

    Abstract: Some embodiments provide a method for performing data message processing at a smart NIC of a computer that executes a software forwarding element (SFE). The method determines whether a received data message matches an entry in a data message classification cache stored on the smart NIC based on data message classification results of the SFE. When the data message matches an entry, the method determines whether the matched entry is valid by comparing a timestamp of the entry to a set of rules stored on the smart NIC. When the matched entry is valid, the method processes the data message according to the matched entry without providing the data message to the SFE executing on the computer.

    USING LIGHTWEIGHT MACHINE-LEARNING MODEL ON SMART NIC

    公开(公告)号:US20230342398A1

    公开(公告)日:2023-10-26

    申请号:US17727230

    申请日:2022-04-22

    Applicant: VMware, Inc.

    CPC classification number: G06F16/90335

    Abstract: Some embodiments provide a method for using a machine learning (ML) model to respond to a query, at a smart NIC of a computer. The method receives a query including an input. The method applies a first ML model to the input to generate an output and a confidence measure for the output. When the confidence measure for the output is below a threshold, the method discards the output and provides the query to the computer for the computer to apply a second ML model to the input.

    Inter-Feature Influence in Unlabeled Datasets

    公开(公告)号:US20220180244A1

    公开(公告)日:2022-06-09

    申请号:US17115432

    申请日:2020-12-08

    Applicant: VMware, Inc.

    Abstract: In one set of embodiments, a computer system can receive an unlabeled dataset comprising a plurality of unlabeled data instances, each unlabeled data instance including values for a plurality of features. The computer system can train, for each feature, a supervised machine learning (ML) model on a labeled dataset derived from the unlabeled dataset, where the labeled dataset comprises a plurality of labeled data instances, and wherein each labeled data instance includes (1) a label corresponding to a value for the feature in an unlabeled data instance of the unlabeled dataset, and (2) values for other features in the unlabeled data instance. The computer system can then compute, for each pair of first and second features in the plurality of features, an inter-feature influence score using the trained supervised ML model for the second feature, the inter-feature influence score indicating how useful the first feature is in predicting the second feature.

    UNSUPERVISED ANOMALY DETECTION BY SELF-PREDICTION

    公开(公告)号:US20220012626A1

    公开(公告)日:2022-01-13

    申请号:US16924048

    申请日:2020-07-08

    Applicant: VMware, Inc.

    Abstract: Techniques for implementing unsupervised anomaly detection by self-prediction are provided. In one set of embodiments, a computer system can receive an unlabeled training data set comprising a plurality of unlabeled data instances, where each unlabeled data instance includes values for a plurality of features. The computer system can further train, for each feature in the plurality of features, a supervised machine learning (ML) model using a labeled training data set derived from the unlabeled training data set, receive a query data instance, and generate a self-prediction vector using at least a portion of the trained supervised ML models and the query data instance, where the self-prediction vector indicates what the query data instance should look like if it were normal. The computer system can then generate an anomaly score for the query data instance based on the self-prediction vector and the query data instance.

    Internal Load Balancer for Tree-Based Ensemble Classifiers

    公开(公告)号:US20220012550A1

    公开(公告)日:2022-01-13

    申请号:US16923988

    申请日:2020-07-08

    Applicant: VMware, Inc.

    Abstract: Techniques for implementing a tree-based ensemble classifier comprising an internal load balancer are provided. In one set of embodiments, the tree-based ensemble classifier can receive a query data instance and select, via the internal load balancer, a subset of its decision trees for processing the query data instance. The tree-based ensemble classifier can then query each decision tree in the selected subset with the query data instance, combine the per-tree classifications generated by the subset trees to generate a subset classification, and determine whether a confidence level associated with the subset classification is sufficiently high. If the answer is yes, the tree-based ensemble classifier can output the subset classification as a final classification result for the query data instance. If the answer is no, the tree-based ensemble classifier can repeat the foregoing steps until a sufficient confidence level is reached or until all of its decision trees have been selected and queried.

    Intelligent data partitioning for distributed machine learning systems

    公开(公告)号:US11687824B2

    公开(公告)日:2023-06-27

    申请号:US16248622

    申请日:2019-01-15

    Applicant: VMware, Inc.

    CPC classification number: G06N20/00 G06F16/285 G06N5/045

    Abstract: Techniques for implementing intelligent data partitioning for a distributed machine learning (ML) system are provided. In one set of embodiments, a computer system implementing a data partition module can receive a training data instance for a ML task and identify, using a clustering algorithm, a cluster to which the training data instance belongs, the cluster being one of a plurality of clusters determined via the clustering algorithm that partition a data space of the ML task. The computer system can then transmit the training data instance to a ML worker of the distributed ML system that is assigned to the cluster, where the ML worker is configured to build or update a ML model using the training data instance.

    Using Graph Structures to Represent Node State in Deep Reinforcement Learning (RL)-Based Decision Tree Construction

    公开(公告)号:US20220335300A1

    公开(公告)日:2022-10-20

    申请号:US17231476

    申请日:2021-04-15

    Applicant: VMware, Inc.

    Abstract: In one set of embodiments, a deep reinforcement learning (RL) system can train an agent to construct an efficient decision tree for classifying network packets according to a rule set, where the training includes: identifying, by an environment of the deep RL system, a leaf node in a decision tree; computing, by the environment, a graph structure representing a state of the leaf node, the graph structure including information regarding how one or more rules in the rule set that are contained in the leaf node are distributed in a hypercube of the leaf node; communicating, by the environment, the graph structure to the agent; providing, by the agent, the graph structure as input to a graph neural network; and generating, by the graph neural network based on the graph structure, an action to be taken on the leaf node for extending the decision tree.

    TRAINING NEURAL NETWORK CLASSIFIERS USING CLASSIFICATION METADATA FROM OTHER ML CLASSIFIERS

    公开(公告)号:US20220012567A1

    公开(公告)日:2022-01-13

    申请号:US16924015

    申请日:2020-07-08

    Applicant: VMware, Inc.

    Abstract: Techniques for training a neural network classifier using classification metadata from another, non-neural network (non-NN) classifier are provided. In one set of embodiments, a computer system can train the non-NN classifier using a training data set, where the training results in a trained version of the non-NN network classifier. The computer system can further classify a data instance in the plurality of data instances using the trained non-NN classifier, the classifying generating a first class distribution for the data instance, and provide the data instance's feature set as input to a neural network classifier, the providing causing the neural network classifier to generate a second class distribution for the data instance. The computer system can then compute a loss value indicating a degree of divergence between the first and second class distributions and provide the loss value as feedback to the neural network classifier, which can cause the neural network classifier to adjust one or more internal edge weights in an manner that reduces the degree of divergence.

Patent Agency Ranking