Abstract:
Identifying malicious network traffic based on distributed, collaborative sampling includes, at a computing device having connectivity to a network, obtaining a first set of data flows, based on sampling criteria, that represents network traffic between one or more nodes in the network and one or more domains outside of the network, each data flow in the first set of data flows including a plurality of data packets. The first set of data flows is forwarded for correlation with a plurality of other sets of data flows from other networks to generate global intelligence data. Adjusted sampling criteria is generated based on the global intelligence data and a second set of data flows is obtained based on the adjusted sampling criteria.
Abstract:
Techniques are presented to identify malware communication with domain generation algorithm (DGA) generated domains. Sample domain names are obtained and labeled as DGA domains, non-DGA domains or suspicious domains. A classifier is trained in a first stage based on the sample domain names. Sample proxy logs including proxy logs of DGA domains and proxy logs of non-DGA domains are obtained to train the classifier in a second stage based on the plurality of sample domain names and the plurality of sample proxy logs. Live traffic proxy logs are obtained and the classifier is tested by classifying the live traffic proxy logs as DGA proxy logs, and the classifier is forwarded to a second computing device to identify network communication of a third computing device as malware network communication with DGA domains via a network interface unit of the third computing device based on the trained and tested classifier.
Abstract:
In one embodiment, a security device in a computer network detects potential domain generation algorithm (DGA) searching activity using a domain name service (DNS) model to detect abnormally high DNS requests made by a host attempting to locate a command and control (C&C) server in the computer network. The server device also detects potential DGA communications activity based on applying a hostname-based classifier for DGA domains associated with any server internet protocol (IP) address in a data stream from the host. The security device may then correlate the potential DGA searching activity with the potential DGA communications activity, and identifies DGA performing malware based on the correlating, accordingly.
Abstract:
Techniques are presented that identify malware network communications between a computing device and a server based on a cumulative feature vector generated from a group of network traffic records associated with communications between computing devices and servers. Feature vectors are generated, each vector including features extracted from the network traffic records in the group. A self-similarity matrix is computed for each feature which is a representation of the feature that is invariant to an increase or a decrease of feature values across all feature vectors in the group. Each self-similarity matrix is transformed into corresponding histograms to be invariant to a number of network traffic records in the group. The cumulative feature vector is a cumulative representation of the predefined set of features of all network traffic records included in the at least one group of network traffic records and is generated based on the corresponding histograms.
Abstract:
In one embodiment, a device generates one or more time series of characteristics of client-server communications observed in a network for a particular client in the network. The device partitions the one or more time series into sets of time windows based on patterns present in the characteristics of the client-server communications. The device compares the characteristics of the client-server communications from the partitioned time windows to determine measures of behavioral similarity between the compared time windows. The device provides the measures of behavioral similarity between the compared time windows as input to a machine learning-based malware detector. The device causes performance of a mitigation action in the network when the machine learning-based malware detector determines that the particular client in the network is infected with malware.
Abstract:
A system and a method are disclosed for identifying network threats based on hierarchical classification. The system receives packet flows from a data network and determines flow features for the received packet flows based on data from the packet flows. The system also classifies each packet flow into a flow class based on flow features of the packet flow. Based on a criterion, the system selects packet flows from the received packet flows and places the selected packet flows into an event set that represents an event on the network. The system determines event set features for the event set based on the flow features of the selected packet flows. The system then classifies the event set into a set class based on the determined event set features. Based on the set class, the computer system may report a threat incident on an internetworking device that originated the selected packet flows.
Abstract:
Identifying malicious communications by generating data representative of network traffic based on adaptive sampling includes, at a computing device having connectivity to a network, obtaining a set of data flows representing network traffic between one or more nodes in the network and one or more domains outside of the network, wherein each data flow in the set of data flows includes a plurality of data packets. One or more features are extracted from the set of data flows based on statistical measurements of the set of data flows. The set of data flows are adaptively sampled based on at least the one or more features. Then, data representative of the network traffic is generated based on the adaptively sampling to identify malicious communication channels in the network traffic.
Abstract:
Techniques are presented that identify malware network communications between a computing device and a server based on a cumulative feature vector generated from a group of network traffic records associated with communications between computing devices and servers. Feature vectors are generated, each vector including features extracted from the network traffic records in the group. A self-similarity matrix is computed for each feature which is a representation of the feature that is invariant to an increase or a decrease of feature values across all feature vectors in the group. Each self-similarity matrix is transformed into corresponding histograms to be invariant to a number of network traffic records in the group. The cumulative feature vector is a cumulative representation of the predefined set of features of all network traffic records included in the at least one group of network traffic records and is generated based on the corresponding histograms.
Abstract:
In one embodiment, a system includes a processor to receive network flows, for each of one of a plurality of event-types, compare each one of the network flows to a flow-specific criteria of the one event-type to determine if the one network flow satisfies the flow-specific criteria, for each one of the event-types, for each one of the network flows satisfying the flow-specific criteria of the one event-type, assign the one network flow to a proto-event of the one-event type, test different combinations of the network flows assigned to the proto-event of the one event-type against aggregation criteria of the one event-type to determine if one combination of the network flows assigned to the proto-event of the one event-type satisfies the aggregation criteria for the one event-type and identifies an event of the one event-type from among the network flows of the proto-event. Related apparatus and methods are also described.
Abstract:
In one embodiment, a security device in a computer network determines a plurality of values for a plurality of features from samples of known malware, and computes one or more significant values out of the plurality of values, where each of the one or more significant values occurs across greater than a significance threshold of the samples. The security device may then determine feature values for samples of unlabeled traffic, and declares one or more particular samples of unlabeled traffic as synthetic malicious flow samples in response to all feature values for each synthetic malicious flow sample matching a respective one of the significant values for each corresponding respective feature. The security device may then use the samples of known malware and the synthetic malicious flow samples for model-based malware detection.