Clustering botnet behavior using parameterized models
    1.
    发明授权
    Clustering botnet behavior using parameterized models 有权
    使用参数化模型集群僵尸网络行为

    公开(公告)号:US08745731B2

    公开(公告)日:2014-06-03

    申请号:US12061664

    申请日:2008-04-03

    CPC分类号: H04L63/1441 H04L2463/144

    摘要: Identification and prevention of email spam that originates from botnets may be performed by finding similarity in their host property and behavior patterns using a set of labeled data. Clustering models of host properties pertaining to previously identified and appropriately tagged botnet hosts may be learned. Given labeled data, each botnet may be examined individually and a clustering model learned to reflect upon a set of selected host properties. Once a model has been learned for every botnet, clustering behavior may be used to look for host properties that fit into a profile. Such traffic can be either discarded or tagged for subsequent analysis and can also be used to profile botnets preventing them from launching other attacks. In addition, models of individual botnets can be further clustered to form superclusters, which can help understand botnet behavior and detect future attacks.

    摘要翻译: 识别和预防来自僵尸网络的电子邮件垃圾邮件可以通过使用一组标签数据来查找其主机属性和行为模式的相似性来执行。 可以了解与以前识别和适当标记的僵尸网络主机相关的主机属性的聚类模型。 给定标签数据,可以单独检查每个僵尸网络,并且学习聚类模型以反映一组选定的主机属性。 一旦为每个僵尸网络学习了一个模型,可以使用聚类行为来查找适合于配置文件的主机属性。 这样的流量可以被丢弃或被标记用于后续分析,并且还可以用于描述僵尸网络,防止他们发起其他攻击。 另外,个人僵尸网络的模型可以进一步集群以形成超级集群,这可以帮助了解僵尸网络行为并检测未来的攻击。

    Automatically identifying dynamic internet protocol addresses
    2.
    发明授权
    Automatically identifying dynamic internet protocol addresses 有权
    自动识别动态互联网协议地址

    公开(公告)号:US08856360B2

    公开(公告)日:2014-10-07

    申请号:US11821211

    申请日:2007-06-22

    IPC分类号: G06F15/16 H04L29/06 H04L29/12

    摘要: Dynamic IP addresses may be automatically identified and their dynamics patterns may be analyzed. Multi-user IP address blocks are determined as candidates for further analysis. An entropy score is determined for each IP address in every candidate block to distinguish between a dynamic IP and a static IP shared by multiple users. IP addresses with high entropy scores are grouped, and then analyzed, and may be used in various applications, such as spam filtering.

    摘要翻译: 可以自动识别动态IP地址,并且可以分析其动态模式。 多用户IP地址块被确定为进一步分析的候选者。 为每个候选块中的每个IP地址确定熵分数,以区分动态IP和由多个用户共享的静态IP。 具有高熵分数的IP地址被分组,然后分析,并且可以用于各种应用中,例如垃圾邮件过滤。

    CLUSTERING BOTNET BEHAVIOR USING PARAMETERIZED MODELS
    3.
    发明申请
    CLUSTERING BOTNET BEHAVIOR USING PARAMETERIZED MODELS 有权
    使用参数化模型聚合BOTNET行为

    公开(公告)号:US20090254989A1

    公开(公告)日:2009-10-08

    申请号:US12061664

    申请日:2008-04-03

    IPC分类号: G06F11/00 G06F9/455

    CPC分类号: H04L63/1441 H04L2463/144

    摘要: Identification and prevention of email spam that originates from botnets may be performed by finding similarity in their host property and behavior patterns using a set of labeled data. Clustering models of host properties pertaining to previously identified and appropriately tagged botnet hosts may be learned. Given labeled data, each botnet may be examined individually and a clustering model learned to reflect upon a set of selected host properties. Once a model has been learned for every botnet, clustering behavior may be used to look for host properties that fit into a profile. Such traffic can be either discarded or tagged for subsequent analysis and can also be used to profile botnets preventing them from launching other attacks. In addition, models of individual botnets can be further clustered to form superclusters, which can help understand botnet behavior and detect future attacks.

    摘要翻译: 识别和预防来自僵尸网络的电子邮件垃圾邮件可以通过使用一组标签数据来查找其主机属性和行为模式的相似性来执行。 可以了解与以前识别和适当标记的僵尸网络主机相关的主机属性的聚类模型。 给定标签数据,可以单独检查每个僵尸网络,并且学习聚类模型以反映一组选定的主机属性。 一旦为每个僵尸网络学习了一个模型,可以使用聚类行为来查找适合于配置文件的主机属性。 这样的流量可以被丢弃或被标记用于后续分析,并且还可以用于描述僵尸网络,防止他们发起其他攻击。 另外,个人僵尸网络的模型可以进一步集群以形成超级集群,这可以帮助了解僵尸网络行为并检测未来的攻击。

    AUTOMATIC BOTNET SPAM SIGNATURE GENERATION
    4.
    发明申请
    AUTOMATIC BOTNET SPAM SIGNATURE GENERATION 审中-公开
    自动BOTNET垃圾邮件签名生成

    公开(公告)号:US20090265786A1

    公开(公告)日:2009-10-22

    申请号:US12104441

    申请日:2008-04-17

    IPC分类号: G06F21/00

    摘要: A framework may be used for generating URL signatures to identify botnet spam and membership. The framework may take a set of unlabeled emails as input that are grouped based on URLs contained within the emails. The framework may return a set of spam URL signatures and a list of corresponding botnet host IP addresses by analyzing the URLs within the emails that are contained within the groups. Each URL signature may be in the form of either a complete URL string or a URL regular expression. The signatures may be used to identify spam emails launched from botnets, while the knowledge of botnet host identities can help filter other spam emails also sent by them.

    摘要翻译: 一个框架可以用于生成URL签名来识别僵尸网络垃圾邮件和会员资格。 框架可以采用一组未标记的电子邮件作为基于邮件中包含的URL分组的输入。 框架可以通过分析包含在组内的电子邮件中的URL来返回一组垃圾邮件URL签名和相应僵尸网络主机IP地址的列表。 每个URL签名可以是完整的URL字符串或URL正则表达式的形式。 签名可用于识别从僵尸网络发起的垃圾邮件,而僵尸网络主机身份的知识可以帮助过滤他们发送的其他垃圾邮件。

    Automatically identifying dynamic Internet protocol addresses
    5.
    发明申请
    Automatically identifying dynamic Internet protocol addresses 有权
    自动识别动态互联网协议地址

    公开(公告)号:US20080320119A1

    公开(公告)日:2008-12-25

    申请号:US11821211

    申请日:2007-06-22

    IPC分类号: G06F15/177

    摘要: Dynamic IP addresses may be automatically identified and their dynamics patterns may be analyzed. Multi-user IP address blocks are determined as candidates for further analysis. An entropy score is determined for each IP address in every candidate block to distinguish between a dynamic IP and a static IP shared by multiple users. IP addresses with high entropy scores are grouped, and then analyzed, and may be used in various applications, such as spam filtering.

    摘要翻译: 可以自动识别动态IP地址,并且可以分析其动态模式。 多用户IP地址块被确定为进一步分析的候选者。 为每个候选块中的每个IP地址确定熵分数,以区分动态IP和由多个用户共享的静态IP。 具有高熵分数的IP地址被分组,然后分析,并且可以用于各种应用中,例如垃圾邮件过滤。

    Adding prototype information into probabilistic models
    6.
    发明授权
    Adding prototype information into probabilistic models 有权
    将原型信息添加到概率模型中

    公开(公告)号:US08010341B2

    公开(公告)日:2011-08-30

    申请号:US11855099

    申请日:2007-09-13

    摘要: Mechanisms are disclosed for incorporating prototype information into probabilistic models for automated information processing, mining, and knowledge discovery. Examples of these models include Hidden Markov Models (HMMs), Latent Dirichlet Allocation (LDA) models, and the like. The prototype information injects prior knowledge to such models, thereby rendering them more accurate, effective, and efficient. For instance, in the context of automated word labeling, additional knowledge is encoded into the models by providing a small set of prototypical words for each possible label. The net result is that words in a given corpus are labeled and are therefore in condition to be summarized, identified, classified, clustered, and the like.

    摘要翻译: 公开了将原型信息并入用于自动化信息处理,挖掘和知识发现的概率模型中的机制。 这些模型的示例包括隐马尔可夫模型(HMM),潜在狄利克雷分配(LDA)模型等。 原型信息将先前的知识注入到这些模型中,从而使它们更准确,有效和高效。 例如,在自动化字标识的上下文中,通过为每个可能的标签提供一小组原型字来将附加知识编码到模型中。 最终的结果是,给定语料库中的单词被标记,因此在其中被概括,识别,分类,聚类等等。

    Using linear and log-linear model combinations for estimating probabilities of events
    7.
    发明授权
    Using linear and log-linear model combinations for estimating probabilities of events 有权
    使用线性和对数线性模型组合来估计事件的概率

    公开(公告)号:US08484077B2

    公开(公告)日:2013-07-09

    申请号:US12893939

    申请日:2010-09-29

    IPC分类号: G06Q30/00

    摘要: A method for combining multiple probability of click models in an online advertising system into a combined predictive model, the method commencing by receiving a feature set slice (e.g. corresponding to demographics or taxonomies or clusters), and using the sliced data for training multiple slice-wise predictive models. The trained slice-wise predictive models are combined by overlaying a weighted distribution model over the trained slice-wise predictive models. The combined predictive model then is used in predicting the probability of a click given a query-advertisement pair in online advertising. The method can flexibly receive slice specifications, and can overlay any one or more of a variety of distribution models, such as a linear combination or a log-linear combination. Using an appropriate weighted distribution model, the combined predictive model reliably yields predictive estimates of occurrence of click events that are at least as good as the best predictive model in the slice-wise predictive model set.

    摘要翻译: 一种将在线广告系统中的点击模型的多种概率组合成组合预测模型的方法,该方法通过接收特征集切片(例如,对应于人口统计学或分类或群集)开始,并且使用分片数据来训练多个切片 - 明智的预测模型。 训练的切片预测模型通过在训练的切片预测模型上重叠加权分布模型来组合。 然后,组合预测模型用于预测在线广告中给予查询广告对的点击的概率。 该方法可以灵活地接收切片规格,并且可以覆盖各种分布模型中的任何一个或多个,例如线性组合或对数线性组合。 使用适当的加权分布模型,组合预测模型可靠地产生至少与切片预测模型集中的最佳预测模型一样好的点击事件发生的预测估计。

    Using Clicked Slate Driven Click-Through Rate Estimates in Sponsored Search
    8.
    发明申请
    Using Clicked Slate Driven Click-Through Rate Estimates in Sponsored Search 有权
    使用Clicked Slate驱动的点击率估计在赞助搜索

    公开(公告)号:US20120136722A1

    公开(公告)日:2012-05-31

    申请号:US12956496

    申请日:2010-11-30

    IPC分类号: G06Q30/00

    摘要: A computer-implemented method and system for selecting a subject advertisement in a sponsored search system based on a user's commercial intent (pertaining to the subject advertisement), using techniques for determining intent-driven clicks from a historical database. The method includes steps for aggregating a training model dataset wherein the training model dataset contains a selected history of clicks. Then, selecting from the training model dataset, a clicked slate (further selection of clicks), the clicked slate comprising a set of clicked ads, and calculating an intent-driven click feedback value for the subject advertisement. The method includes techniques for selecting a clicked slate using features corresponding to clicks received within a particular time period (the time period determined statically or dynamically). A system for implementing the method includes aggregating data from a historical database using selectors such as a position selector, a click feature selector, an impression-advertiser-campaign-creative selector, and a commercial intent selector.

    摘要翻译: 一种计算机实现的方法和系统,用于使用用于从历史数据库确定意图驱动的点击的技术,基于用户的商业意图(涉及主题广告)在赞助搜索系统中选择主题广告。 该方法包括用于聚合训练模型数据集的步骤,其中训练模型数据集包含所选择的点击历史。 然后,从训练模型数据集中选择点击的图表(进一步选择点击次数),点击的点击包括一组点击的广告,以及计算主题广告的意图驱动的点击反馈值。 该方法包括使用与在特定时间段(静态或动态确定的时间段)内接收的点击对应的特征来选择点击的平板的技术。 用于实现该方法的系统包括使用诸如位置选择器,点击特征选择器,展示广告商 - 广告系列创意选择器和商业意图选择器之类的选择器来汇总来自历史数据库的数据。

    Using Linear and Log-Linear Model Combinations for Estimating Probabilities of Events
    9.
    发明申请
    Using Linear and Log-Linear Model Combinations for Estimating Probabilities of Events 有权
    使用线性和对数线性模型组合来估计事件的概率

    公开(公告)号:US20120022952A1

    公开(公告)日:2012-01-26

    申请号:US12893939

    申请日:2010-09-29

    IPC分类号: G06Q30/00

    摘要: A method for combining multiple probability of click models in an online advertising system into a combined predictive model, the method commencing by receiving a feature set slice (e.g. corresponding to demographics or taxonomies or clusters), and using the sliced data for training multiple slice-wise predictive models. The trained slice-wise predictive models are combined by overlaying a weighted distribution model over the trained slice-wise predictive models. The combined predictive model then is used in predicting the probability of a click given a query-advertisement pair in online advertising. The method can flexibly receive slice specifications, and can overlay any one or more of a variety of distribution models, such as a linear combination or a log-linear combination. Using an appropriate weighted distribution model, the combined predictive model reliably yields predictive estimates of occurrence of click events that are at least as good as the best predictive model in the slice-wise predictive model set.

    摘要翻译: 一种将在线广告系统中的点击模型的多种概率组合成组合预测模型的方法,该方法通过接收特征集切片(例如,对应于人口统计学或分类或群集)开始,并且使用分片数据来训练多个切片 - 明智的预测模型。 训练的切片预测模型通过在训练的切片预测模型上重叠加权分布模型来组合。 然后,组合预测模型用于预测在线广告中给予查询广告对的点击的概率。 该方法可以灵活地接收切片规格,并且可以覆盖各种分布模型中的任何一个或多个,例如线性组合或对数线性组合。 使用适当的加权分布模型,组合预测模型可靠地产生至少与切片预测模型集中的最佳预测模型一样好的点击事件发生的预测估计。

    Using clicked slate driven click-through rate estimates in sponsored search
    10.
    发明授权
    Using clicked slate driven click-through rate estimates in sponsored search 有权
    在赞助搜索中使用点击式平板驱动的点击率估算

    公开(公告)号:US08364525B2

    公开(公告)日:2013-01-29

    申请号:US12956496

    申请日:2010-11-30

    IPC分类号: G06Q30/00

    摘要: A computer-implemented method and system for selecting a subject advertisement in a sponsored search system based on a user's commercial intent (pertaining to the subject advertisement), using techniques for determining intent-driven clicks from a historical database. The method includes steps for aggregating a training model dataset wherein the training model dataset contains a selected history of clicks. Then, selecting from the training model dataset, a clicked slate (further selection of clicks), the clicked slate comprising a set of clicked ads, and calculating an intent-driven click feedback value for the subject advertisement. The method includes techniques for selecting a clicked slate using features corresponding to clicks received within a particular time period (the time period determined statically or dynamically). A system for implementing the method includes aggregating data from a historical database using selectors such as a position selector, a click feature selector, an impression-advertiser-campaign-creative selector, and a commercial intent selector.

    摘要翻译: 一种计算机实现的方法和系统,用于使用用于从历史数据库确定意图驱动的点击的技术,基于用户的商业意图(涉及主题广告)在赞助搜索系统中选择主题广告。 该方法包括用于聚合训练模型数据集的步骤,其中训练模型数据集包含所选择的点击历史。 然后,从训练模型数据集中选择点击的图表(进一步选择点击次数),点击的图片包括一组点击的广告,以及计算该主题广告的意图驱动的点击反馈值。 该方法包括使用与在特定时间段(静态或动态确定的时间段)内接收的点击对应的特征来选择点击的平板的技术。 用于实现该方法的系统包括使用诸如位置选择器,点击特征选择器,展示广告商 - 广告系列创意选择器和商业意图选择器之类的选择器来汇总来自历史数据库的数据。