Abstract:
In one embodiment, a computing device provides a feature vector as input to a random decision forest comprising a plurality of decision trees trained using a training dataset, each decision tree being configured to output a classification label prediction for the input feature vector. For each of the decision trees, the computing device determines a conditional probability of the decision tree based on a true classification label and the classification label prediction from the decision tree for the input feature vector. The computing device generates weightings for the classification label predictions from the decision trees based on the determined conditional probabilities. The computing device applies a final classification label to the feature vector based on the weightings for the classification label predictions from the decision trees.
Abstract:
In one embodiment, a method including accessing a trained classifier, the trained classifier trained based at least on a first data item and including both decision determination information of the first data item and decision explanation information of at least one second data item, the second data item being distinct from the first data item; receiving an item for classification; using the trained classifier to classify the item for classification; and providing item decision information regarding a reason for classifying the item for classification, the item decision information being based on at least a part of the decision explanation information. Other embodiments are also described.
Abstract:
In one embodiment, a computing device trains a multi-class classifier (having a plurality of classes) on a training dataset, and evaluates the multi-class classifier on a testing dataset to determine a performance of each of the plurality of classes. The plurality of classes may then be partitioned into either learnable or unlearnable based on whether the performance each particular class surpasses a particular threshold, and then a predicting classifier can be trained on the training dataset, where data of the training dataset is labelled as either learnable or unlearnable based on the particular class to which the data corresponds. Accordingly, the computing device may then use the predicting classifier on a new class to predict whether samples associated with the new class are learnable or unlearnable, and may retrain the multi-class classifier with the samples associated with the new class in response to predicting that the samples are learnable.
Abstract:
In one embodiment, a computing device provides a feature vector as input to a random decision forest comprising a plurality of decision trees trained using a training dataset, each decision tree being configured to output a classification label prediction for the input feature vector. For each of the decision trees, the computing device determines a conditional probability of the decision tree based on a true classification label and the classification label prediction from the decision tree for the input feature vector. The computing device generates weightings for the classification label predictions from the decision trees based on the determined conditional probabilities. The computing device applies a final classification label to the feature vector based on the weightings for the classification label predictions from the decision trees.
Abstract:
In one embodiment, a computing device provides a feature vector as input to a random decision forest comprising a plurality of decision trees trained using a training dataset, each decision tree being configured to output a classification label prediction for the input feature vector. For each of the decision trees, the computing device determines a conditional probability of the decision tree based on a true classification label and the classification label prediction from the decision tree for the input feature vector. The computing device generates weightings for the classification label predictions from the decision trees based on the determined conditional probabilities. The computing device applies a final classification label to the feature vector based on the weightings for the classification label predictions from the decision trees.
Abstract:
A user behavior activity detection method is provided in which network traffic relating to user behavior activities in a network is monitored. Data is stored representing network traffic within a plurality of time periods, each of the time periods serving as a transaction. Subsets of the network traffic in the transactions are identified as traffic suspected of relating to certain user behavior activities. The subsets of the network traffic in the transactions are assigned into one or more groups. A determination is made of one or more detection rules for each of the one or more groups based on identifying, for each of the groups, a number of user behavior activities common to each of the subsets of the network traffic. The one or more detection rules are used to monitor future network traffic in the network to detect occurrence of the certain user behavior activities.
Abstract:
Techniques are presented to identify malware communication with domain generation algorithm (DGA) generated domains. Sample domain names are obtained and labeled as DGA domains, non-DGA domains or suspicious domains. A classifier is trained in a first stage based on the sample domain names. Sample proxy logs including proxy logs of DGA domains and proxy logs of non-DGA domains are obtained to train the classifier in a second stage based on the plurality of sample domain names and the plurality of sample proxy logs. Live traffic proxy logs are obtained and the classifier is tested by classifying the live traffic proxy logs as DGA proxy logs, and the classifier is forwarded to a second computing device to identify network communication of a third computing device as malware network communication with DGA domains via a network interface unit of the third computing device based on the trained and tested classifier.