Augmenting Training Data Sets for ML Classifiers Using Classification Metadata

    公开(公告)号:US20220012535A1

    公开(公告)日:2022-01-13

    申请号:US16924009

    申请日:2020-07-08

    Applicant: VMware, Inc.

    Abstract: Techniques for augmenting training data sets for machine learning (ML) classifiers using classification metadata are provided. In one set of embodiments, a computer system can train a first ML classifier using a training data set, where the training data set comprises a plurality of data instances, where each data instance includes a set of features, and where the training results in a trained version of the first ML classifier. The computer system can further classify each data instance in the plurality of data instances using the trained version of the first ML classifier, the classifications generating classification metadata for each data instance, and augment the training data set with the classification metadata to create an augmented version of the training data set. The computer system can then train a second ML classifier using the augmented version of the training data set.

    Efficient Machine Learning (ML) Model for Classification

    公开(公告)号:US20210216831A1

    公开(公告)日:2021-07-15

    申请号:US16743865

    申请日:2020-01-15

    Applicant: VMware, Inc.

    Inventor: Yaniv Ben-Itzhak

    Abstract: Techniques for implementing an efficient machine learning (ML) model for classification are provided. In one set of embodiments, a computer system can receive a query data instance to be classified. The computer system can then generate a first classification result for the query data instance using a first (i.e., primary) ML model, where the first classification result includes a predicted class for the query data instance and a confidence level indicating a likelihood that the predicted class is correct, and compare the confidence level with a classification confidence threshold. If the confidence level is greater than or equal to the classification confidence threshold, the computer system can output the first classification result as a final classification result for the query data instance. However, if the confidence level is less than the classification confidence threshold, the computer system can forward the query data instance to one of a plurality of second (i.e., secondary) ML models for further classification.

    Intelligent data partitioning for distributed machine learning systems

    公开(公告)号:US11687824B2

    公开(公告)日:2023-06-27

    申请号:US16248622

    申请日:2019-01-15

    Applicant: VMware, Inc.

    CPC classification number: G06N20/00 G06F16/285 G06N5/045

    Abstract: Techniques for implementing intelligent data partitioning for a distributed machine learning (ML) system are provided. In one set of embodiments, a computer system implementing a data partition module can receive a training data instance for a ML task and identify, using a clustering algorithm, a cluster to which the training data instance belongs, the cluster being one of a plurality of clusters determined via the clustering algorithm that partition a data space of the ML task. The computer system can then transmit the training data instance to a ML worker of the distributed ML system that is assigned to the cluster, where the ML worker is configured to build or update a ML model using the training data instance.

    Using Graph Structures to Represent Node State in Deep Reinforcement Learning (RL)-Based Decision Tree Construction

    公开(公告)号:US20220335300A1

    公开(公告)日:2022-10-20

    申请号:US17231476

    申请日:2021-04-15

    Applicant: VMware, Inc.

    Abstract: In one set of embodiments, a deep reinforcement learning (RL) system can train an agent to construct an efficient decision tree for classifying network packets according to a rule set, where the training includes: identifying, by an environment of the deep RL system, a leaf node in a decision tree; computing, by the environment, a graph structure representing a state of the leaf node, the graph structure including information regarding how one or more rules in the rule set that are contained in the leaf node are distributed in a hypercube of the leaf node; communicating, by the environment, the graph structure to the agent; providing, by the agent, the graph structure as input to a graph neural network; and generating, by the graph neural network based on the graph structure, an action to be taken on the leaf node for extending the decision tree.

    TRAINING NEURAL NETWORK CLASSIFIERS USING CLASSIFICATION METADATA FROM OTHER ML CLASSIFIERS

    公开(公告)号:US20220012567A1

    公开(公告)日:2022-01-13

    申请号:US16924015

    申请日:2020-07-08

    Applicant: VMware, Inc.

    Abstract: Techniques for training a neural network classifier using classification metadata from another, non-neural network (non-NN) classifier are provided. In one set of embodiments, a computer system can train the non-NN classifier using a training data set, where the training results in a trained version of the non-NN network classifier. The computer system can further classify a data instance in the plurality of data instances using the trained non-NN classifier, the classifying generating a first class distribution for the data instance, and provide the data instance's feature set as input to a neural network classifier, the providing causing the neural network classifier to generate a second class distribution for the data instance. The computer system can then compute a loss value indicating a degree of divergence between the first and second class distributions and provide the loss value as feedback to the neural network classifier, which can cause the neural network classifier to adjust one or more internal edge weights in an manner that reduces the degree of divergence.

    CONTAINERIZED WORKLOAD SCHEDULING
    27.
    发明申请

    公开(公告)号:US20200341789A1

    公开(公告)日:2020-10-29

    申请号:US16394663

    申请日:2019-04-25

    Applicant: VMware, Inc.

    Abstract: A method for containerized workload scheduling can include monitoring network traffic between a first containerized workload deployed on a node in a virtual computing environment to determine affinities between the first containerized workload and other containerized workloads in the virtual computing environment. The method can further include scheduling, based, at least in part, on the determined affinities between the first containerized workload and the other containerized workloads, execution of a second containerized workload on the node on which the first containerized workload is deployed.

    Intelligent Data Partitioning for Distributed Machine Learning Systems

    公开(公告)号:US20200226491A1

    公开(公告)日:2020-07-16

    申请号:US16248622

    申请日:2019-01-15

    Applicant: VMware, Inc.

    Abstract: Techniques for implementing intelligent data partitioning for a distributed machine learning (ML) system are provided. In one set of embodiments, a computer system implementing a data partition module can receive a training data instance for a ML task and identify, using a clustering algorithm, a cluster to which the training data instance belongs, the cluster being one of a plurality of clusters determined via the clustering algorithm that partition a data space of the ML task. The computer system can then transmit the training data instance to a ML worker of the distributed ML system that is assigned to the cluster, where the ML worker is configured to build or update a ML model using the training data instance.

    Unsupervised anomaly detection by self-prediction

    公开(公告)号:US11928857B2

    公开(公告)日:2024-03-12

    申请号:US16924048

    申请日:2020-07-08

    Applicant: VMware, Inc.

    Abstract: Techniques for implementing unsupervised anomaly detection by self-prediction are provided. In one set of embodiments, a computer system can receive an unlabeled training data set comprising a plurality of unlabeled data instances, where each unlabeled data instance includes values for a plurality of features. The computer system can further train, for each feature in the plurality of features, a supervised machine learning (ML) model using a labeled training data set derived from the unlabeled training data set, receive a query data instance, and generate a self-prediction vector using at least a portion of the trained supervised ML models and the query data instance, where the self-prediction vector indicates what the query data instance should look like if it were normal. The computer system can then generate an anomaly score for the query data instance based on the self-prediction vector and the query data instance.

    ACCELERATING DATA MESSAGE CLASSIFICATION WITH SMART NICS

    公开(公告)号:US20230409484A1

    公开(公告)日:2023-12-21

    申请号:US17845661

    申请日:2022-06-21

    Applicant: VMware, Inc.

    Abstract: Some embodiments provide a method for performing data message processing at a smart NIC of a computer that executes a software forwarding element (SFE). The method determines whether a received data message matches an entry in a data message classification cache stored on the smart NIC based on data message classification results of the SFE. When the data message matches an entry, the method determines whether the matched entry is valid by comparing a timestamp of the entry to a set of rules stored on the smart NIC. When the matched entry is valid, the method processes the data message according to the matched entry without providing the data message to the SFE executing on the computer.

Patent Agency Ranking