Malware families identification based upon hierarchical clustering

    公开(公告)号:US12254089B1

    公开(公告)日:2025-03-18

    申请号:US18535386

    申请日:2023-12-11

    Abstract: Behavior report generation monitors the behavior of unknown sample files executing in a sandbox. Behaviors are encoded and feature vectors created based upon a q-gram for each sample. Prototypes extraction includes extracting prototypes from the training set of feature vectors using a clustering algorithm. Once prototypes are identified in this training process, the prototypes with unknown labels are reviewed by domain experts who add a label to each prototype. A K-Nearest Neighbor Graph is used to merge prototypes into fewer prototypes without using a fixed distance threshold and then assigning a malware family name to each remaining prototype. An input unknown sample can be classified using the remaining prototypes and using a fixed distance. For the case that no such prototype is close enough, the behavior report of a sample is rejected and tagged as an unknown sample or that of an emerging malware family.

    Malware families identification based upon hierarchical clustering

    公开(公告)号:US11886586B1

    公开(公告)日:2024-01-30

    申请号:US16811651

    申请日:2020-03-06

    CPC classification number: G06F21/566 G06F9/54 G06F18/23213 G06F21/568

    Abstract: Behavior report generation monitors the behavior of unknown sample files executing in a sandbox. Behaviors are encoded and feature vectors created based upon a q-gram for each sample. Prototypes extraction includes extracting prototypes from the training set of feature vectors using a clustering algorithm. Once prototypes are identified in this training process, the prototypes with unknown labels are reviewed by domain experts who add a label to each prototype. A K-Nearest Neighbor Graph is used to merge prototypes into fewer prototypes without using a fixed distance threshold and then assigning a malware family name to each remaining prototype. An input unknown sample can be classified using the remaining prototypes and using a fixed distance. For the case that no such prototype is close enough, the behavior report of a sample is rejected and tagged as an unknown sample or that of an emerging malware family.

Patent Agency Ranking