Scalable training of random forests for high precise malware detection
Abstract:
In one embodiment, a device trains a machine learning-based malware classifier using a first randomly selected subset of samples from a training dataset. The classifier comprises a random decision forest. The device identifies, using at least a portion of the training dataset as input to the malware classifier, a set of misclassified samples from the training dataset that the malware classifier misclassifies. The device retrains the malware classifier using a second randomly selected subset of samples from the training dataset and the identified set of misclassified samples. The device adjusts prediction labels of individual leaves of the random decision forest of the retrained malware classifier based in part on decision changes in the forest that result from assessing the entire training dataset with the classifier. The device sends the malware classifier with the adjusted prediction labels for deployment into a network.
Information query
Patent Agency Ranking
0/0