DETECTING ENCRYPTED MALWARE WITH SPLT-BASED DEEP NETWORKS

    公开(公告)号:US20200186547A1

    公开(公告)日:2020-06-11

    申请号:US16216361

    申请日:2018-12-11

    Abstract: In one embodiment, a device obtains telemetry data for a plurality of encrypted traffic flows observed in a network. The device clusters the flows into observed flow clusters, based on one or more flow-level features of the obtained telemetry data, as well as malware-related traffic telemetry data into malware-related flow clusters. The observed and malware-related telemetry data are indicative of sequence of packet lengths and times (SPLT) information for the traffic flows. The device samples sets of flows from the observed and malware-related flow clusters, with each set including at least one flow from an observed flow cluster and at least one flow from a malware-related flow cluster. The device trains a deep learning neural network to determine whether a particular encrypted traffic flow is malware-related, by using the SPLT information for the sampled sets of traffic flows as input to an input layer of neurons of the deep network.

    REFINED LEARNING DATA REPRESENTATION FOR CLASSIFIERS

    公开(公告)号:US20170316342A1

    公开(公告)日:2017-11-02

    申请号:US15143792

    申请日:2016-05-02

    CPC classification number: G06N20/00 G06F17/11 G06F21/552 G06N20/10 H04L63/1425

    Abstract: In one embodiment, a learning machine device initializes thresholds of a data representation of one or more data features, the thresholds specifying a first number of pre-defined bins (e.g., uniform and equidistant bins). Next, adjacent bins of the pre-defined bins having substantially similar weights may be reciprocally merged, the merging resulting in a second number of refined bins that is less than the first number. Notably, while merging, the device also learns weights of a linear decision rule associated with the one or more data features. Accordingly, a data-driven representation for a data-driven classifier may be established based on the refined bins and learned weights.

    Identifying Malware Communications with DGA Generated Domains by Discriminative Learning
    5.
    发明申请
    Identifying Malware Communications with DGA Generated Domains by Discriminative Learning 有权
    通过歧视性学习识别与DGA生成的域的恶意软件通信

    公开(公告)号:US20170026390A1

    公开(公告)日:2017-01-26

    申请号:US14806236

    申请日:2015-07-22

    Abstract: Techniques are presented to identify malware communication with domain generation algorithm (DGA) generated domains. Sample domain names are obtained and labeled as DGA domains, non-DGA domains or suspicious domains. A classifier is trained in a first stage based on the sample domain names. Sample proxy logs including proxy logs of DGA domains and proxy logs of non-DGA domains are obtained to train the classifier in a second stage based on the plurality of sample domain names and the plurality of sample proxy logs. Live traffic proxy logs are obtained and the classifier is tested by classifying the live traffic proxy logs as DGA proxy logs, and the classifier is forwarded to a second computing device to identify network communication of a third computing device as malware network communication with DGA domains via a network interface unit of the third computing device based on the trained and tested classifier.

    Abstract translation: 提出技术来识别与域生成算法(DGA)生成域的恶意软件通信。 获取样品域名并标记为DGA域,非DGA域或可疑域。 分类器在第一阶段根据样本域名进行培训。 获得包括DGA域的代理日志和非DGA域的代理日志的示例代理日志,以在第二阶段中基于多个示例域名和多个示例代理日志来训练分类器。 获取实时流量代理日志,并通过将实时流量代理日志分类为DGA代理日志来测试分类器,并将分类器转发到第二计算设备,以将第三计算设备的网络通信识别为与DGA域的恶意软件网络通信,通过 基于经过训练和测试的分类器的第三计算设备的网络接口单元。

    Detecting encrypted malware with SPLT-based deep networks

    公开(公告)号:US11201877B2

    公开(公告)日:2021-12-14

    申请号:US16216361

    申请日:2018-12-11

    Abstract: In one embodiment, a device obtains telemetry data for a plurality of encrypted traffic flows observed in a network. The device clusters the flows into observed flow clusters, based on one or more flow-level features of the obtained telemetry data, as well as malware-related traffic telemetry data into malware-related flow clusters. The observed and malware-related telemetry data are indicative of sequence of packet lengths and times (SPLT) information for the traffic flows. The device samples sets of flows from the observed and malware-related flow clusters, with each set including at least one flow from an observed flow cluster and at least one flow from a malware-related flow cluster. The device trains a deep learning neural network to determine whether a particular encrypted traffic flow is malware-related, by using the SPLT information for the sampled sets of traffic flows as input to an input layer of neurons of the deep network.

    PROTECTING ENDPOINTS WITH PATTERNS FROM ENCRYPTED TRAFFIC ANALYTICS

    公开(公告)号:US20200236131A1

    公开(公告)日:2020-07-23

    申请号:US16251322

    申请日:2019-01-18

    Abstract: In one embodiment, an encrypted traffic analytics service captures telemetry data regarding encrypted network traffic associated with a first endpoint device in a network. The encrypted traffic analytics service receives, from the first endpoint device, an indication that a security agent executed on the first endpoint device has detected malware on the first endpoint device. The encrypted traffic analytics service constructs one or more patterns of encrypted traffic using the captured telemetry data from a time period associated with the received indication. The encrypted traffic analytics service uses the one or more patterns of encrypted traffic to detect malware on a second endpoint device by comparing the one or more patterns of encrypted traffic to telemetry data regarding encrypted network traffic associated with the second endpoint device.

    Identifying malicious communication channels in network traffic by generating data based on adaptive sampling

    公开(公告)号:US10440035B2

    公开(公告)日:2019-10-08

    申请号:US14955480

    申请日:2015-12-01

    Abstract: Identifying malicious communications by generating data representative of network traffic based on adaptive sampling includes, at a computing device having connectivity to a network, obtaining a set of data flows representing network traffic between one or more nodes in the network and one or more domains outside of the network, wherein each data flow in the set of data flows includes a plurality of data packets. One or more features are extracted from the set of data flows based on statistical measurements of the set of data flows. The set of data flows are adaptively sampled based on at least the one or more features. Then, data representative of the network traffic is generated based on the adaptively sampling to identify malicious communication channels in the network traffic.

    REFINING SYNTHETIC MALICIOUS SAMPLES WITH UNLABELED DATA

    公开(公告)号:US20190260775A1

    公开(公告)日:2019-08-22

    申请号:US15898789

    申请日:2018-02-19

    Abstract: In one embodiment, a security device in a computer network determines a plurality of values for a plurality of features from samples of known malware, and computes one or more significant values out of the plurality of values, where each of the one or more significant values occurs across greater than a significance threshold of the samples. The security device may then determine feature values for samples of unlabeled traffic, and declares one or more particular samples of unlabeled traffic as synthetic malicious flow samples in response to all feature values for each synthetic malicious flow sample matching a respective one of the significant values for each corresponding respective feature. The security device may then use the samples of known malware and the synthetic malicious flow samples for model-based malware detection.

    Identifying malicious network traffic based on collaborative sampling

    公开(公告)号:US10264005B2

    公开(公告)日:2019-04-16

    申请号:US15403365

    申请日:2017-01-11

    Abstract: Identifying malicious network traffic based on distributed, collaborative sampling includes, at a computing device having connectivity to a network, obtaining a first set of data flows, based on sampling criteria, that represents network traffic between one or more nodes in the network and one or more domains outside of the network, each data flow in the first set of data flows including a plurality of data packets. The first set of data flows is forwarded for correlation with a plurality of other sets of data flows from other networks to generate global intelligence data. Adjusted sampling criteria is generated based on the global intelligence data and a second set of data flows is obtained based on the adjusted sampling criteria.

Patent Agency Ranking