Abstract:
In one embodiment, a device obtains input features for a neural network-based model. The device pre-defines a set of neurons of the model to represent known behaviors associated with the input features. The device constrains weights for a plurality of outputs of the model. The device trains the neural network-based model using the constrained weights for the plurality of outputs of the model and by excluding the pre-defined set of neurons from updates during the training.
Abstract:
In one embodiment, a device in a network maintains a plurality of machine learning-based detectors for an intrusion detection system. Each detector is associated with a different portion of a feature space of traffic characteristics assessed by the intrusion detection system. The device provides data regarding the plurality of detectors to a user interface. The device receives an adjustment instruction from the user interface based on the data provided to the user interface regarding the plurality of detectors. The device adjusts the portions of the feature space associated with the plurality of detectors based on the adjustment instruction received from the user interface.
Abstract:
Data is collected from a database arrangement about behavior of observed entities, wherein the collected data includes one or more features associated with the observed entities. A probabilistic model is determined that correlates the one or more features with malicious and/or benign behavior of the observed entities. Data is collected from the database arrangement for unobserved entities that have at least one common feature with at least one of the observed entities. One of the unobserved entities is determined to be a malicious entity based on the at least one common feature and the probabilistic model. Network policies are applied to packets sent from the malicious entity.
Abstract:
In one embodiment, a device obtains input features for a neural network-based model. The device pre-defines a set of neurons of the model to represent known behaviors associated with the input features. The device constrains weights for a plurality of outputs of the model. The device trains the neural network-based model using the constrained weights for the plurality of outputs of the model and by excluding the pre-defined set of neurons from updates during the training.
Abstract:
In one embodiment, a device groups feature vectors representing network traffic flows into bags. The device forms a bag representation of a particular one of the bags by aggregating the feature vectors in the particular bag. The device extends one or more feature vectors in the particular bag with the bag representation. The extended one or more feature vectors are positive examples of a classification label for the network traffic. The device trains a network traffic classifier using training data that comprises the one or more feature vectors extended with the bag representation.
Abstract:
In one embodiment, a device disassembles an executable file into assembly instructions. The device maps each of the assembly instructions to a fixed length instruction vector using one-hot encoding and an instruction vocabulary and forms vector representations of blocks of a control flow graph for corresponding functions of the executable file by embedding and aggregating bags of the instruction vectors. The device generates, based on the vector representations of the blocks of the control flow graph, a call graph model of the functions in the executable file. The device forms a vector representation of the executable file based in part on the call graph model. The device determines, based on the vector representation of the executable file, whether the executable file is malware.
Abstract:
Identifying malicious executables by analyzing proxy logs includes, at a server having connectivity to the Internet, retrieving sets of proxy logs from a plurality of proxy servers. Each proxy server of the plurality of proxy servers is associated with a network and generates network traffic logs for one or more nodes included in the network. Then, a set of executables hosted by each of the one or more nodes associated with each of the plurality of proxy servers is determined. Each set of executables is analyzed to detect a specific executable and portions of each of the network traffic logs that are associated with the specific executable are identified. An alert is generated indicating the portions of each of the network traffic logs as likely to be associated with the specific executable.
Abstract:
Presented herein are techniques for classifying devices as being infected with malware based on learned indicators of compromise. A method includes receiving at a security analysis device, traffic flows from a plurality of entities destined for a plurality of users, aggregating the traffic flows into discrete bags of traffic, wherein the bags of traffic comprise a plurality of flows of traffic for a given user over a predetermined period of time, extracting features from the bags of traffic and aggregating the features into per-flow feature vectors, aggregating the per-flow feature vectors into per-destination domain aggregated vectors, combining the per-destination-domain aggregated vectors into a per-user aggregated vector, and classifying a computing device used by a given user as infected with malware when indicators of compromise detected in the bags of traffic indicate that the per-user aggregated vector for the given user includes suspicious features among the extracted features.
Abstract:
Identifying malicious executables by analyzing proxy logs includes, at a server having connectivity to the Internet, retrieving sets of proxy logs from a plurality of proxy servers. Each proxy server of the plurality of proxy servers is associated with a network and generates network traffic logs for one or more nodes included in the network. Then, a set of executables hosted by each of the one or more nodes associated with each of the plurality of proxy servers is determined. Each set of executables is analyzed to detect a specific executable and portions of each of the network traffic logs that are associated with the specific executable are identified. An alert is generated indicating the portions of each of the network traffic logs as likely to be associated with the specific executable.
Abstract:
Data is collected from a database arrangement about behavior of observed entities, wherein the collected data includes one or more features associated with the observed entities. A probabilistic model is determined that correlates the one or more features with malicious and/or benign behavior of the observed entities. Data is collected from the database arrangement for unobserved entities that have at least one common feature with at least one of the observed entities. One of the unobserved entities is determined to be a malicious entity based on the at least one common feature and the probabilistic model. Network policies are applied to packets sent from the malicious entity.