IDENTIFYING IP ADDRESSES FOR SPAMMERS
    1.
    发明申请
    IDENTIFYING IP ADDRESSES FOR SPAMMERS 有权
    识别垃圾邮件的IP地址

    公开(公告)号:US20090216841A1

    公开(公告)日:2009-08-27

    申请号:US12035371

    申请日:2008-02-21

    IPC分类号: G06F15/16

    CPC分类号: H04L51/12

    摘要: Detecting and blocking spam messages using statistical analysis on distributions of message sizes for a given IP address. Mail volumes are examined to model a distribution of volumes to cluster IP addresses. The messages sizes may distributed across ranges of message sizes, which is then used to determine an entropy of message sizes for the given IP address. The entropy of the given IP address may be compared to entropies of known good IP addresses, and if a difference between the entropies is statistically significant, then the given IP address may be determined to be an IP spammer. User feedback may also be employed to further characterize an IP address. For example, a number of messages from the IP address may be sent to intended recipients. User feedback may then be monitored to determine whether to the IP address should be reclassified.

    摘要翻译: 使用对给定IP地址的邮件大小分布的统计分析来检测和阻止垃圾邮件。 检查邮件卷以将卷的分布建模为群集IP地址。 消息大小可以分布在消息大小的范围内,然后用于确定给定IP地址的消息大小的熵。 可以将给定IP地址的熵与已知良好IP地址的熵进行比较,并且如果熵之间的差异具有统计学意义,则给定的IP地址可被确定为IP垃圾邮件发送者。 还可以使用用户反馈来进一步表征IP地址。 例如,可以将来自IP地址的多个消息发送到预期的接收者。 然后可以监视用户反馈,以确定IP地址是否应重新分类。

    Identifying IP addresses for spammers
    2.
    发明授权
    Identifying IP addresses for spammers 有权
    识别垃圾邮件发送者的IP地址

    公开(公告)号:US07849146B2

    公开(公告)日:2010-12-07

    申请号:US12035371

    申请日:2008-02-21

    IPC分类号: G06F15/16

    CPC分类号: H04L51/12

    摘要: Detecting and blocking spam messages using statistical analysis on distributions of message sizes for a given IP address. Mail volumes are examined to model a distribution of volumes to cluster IP addresses. The messages sizes may distributed across ranges of message sizes, which is then used to determine an entropy of message sizes for the given IP address. The entropy of the given IP address may be compared to entropies of known good IP addresses, and if a difference between the entropies is statistically significant, then the given IP address may be determined to be an IP spammer. User feedback may also be employed to further characterize an IP address. For example, a number of messages from the IP address may be sent to intended recipients. User feedback may then be monitored to determine whether to the IP address should be reclassified.

    摘要翻译: 使用对给定IP地址的邮件大小分布的统计分析来检测和阻止垃圾邮件。 检查邮件卷以将卷的分布建模为群集IP地址。 消息大小可以分布在消息大小的范围内,然后用于确定给定IP地址的消息大小的熵。 可以将给定IP地址的熵与已知良好IP地址的熵进行比较,并且如果熵之间的差异具有统计学意义,则给定的IP地址可被确定为IP垃圾邮件发送者。 还可以使用用户反馈来进一步表征IP地址。 例如,可以将来自IP地址的多个消息发送到预期的接收者。 然后可以监视用户反馈,以确定IP地址是否应重新分类。

    Filter for blocking image-based spam
    3.
    发明授权
    Filter for blocking image-based spam 有权
    过滤阻止基于图像的垃圾邮件

    公开(公告)号:US08055078B2

    公开(公告)日:2011-11-08

    申请号:US12039310

    申请日:2008-02-28

    IPC分类号: G06K9/62 G06F15/16

    摘要: A network device and method are directed towards detecting and blocking image spam within a message by employing a weighted min-hash to perform a near duplicate detection (NDD) of determined features within an image as compared to known spam images. The weighting for the min-hash is determined based on employing a machine learning algorithm, such as a perceptron, to identify an importance of each bit in a signature vector of the image. The signature vector is generated by extracting a shape of text in the image using a Discrete Cosine Transform, extracting low-frequency characteristics using a high-pass filter, and then performing various morphological operations to emphasize the shape of the text and reduce noise. Selected feature bits are extracted from the lowest frequency and intensity bits of the resulting signal to generate the signature vector used in the weighted min-hash NDD.

    摘要翻译: 网络设备和方法旨在通过采用加权最小散列来与图像中已知的垃圾邮件图像相比,在图像内执行确定的特征的近似重复检测(NDD)来检测和阻止消息内的图像垃圾邮件。 基于使用机器学习算法(例如感知器)来确定最小散列的加权,以识别图像的签名矢量中每个位的重要性。 通过使用离散余弦变换提取图像中的文本的形状,使用高通滤波器提取低频特性,然后进行各种形态操作以强调文本的形状并降低噪声来生成签名向量。 从所得信号的最低频率和强度比特中提取所选特征位,以产生在加权最小散列NDD中使用的签名向量。

    Learning framework for online applications
    4.
    发明申请
    Learning framework for online applications 有权
    在线应用程序的学习框架

    公开(公告)号:US20090187987A1

    公开(公告)日:2009-07-23

    申请号:US12011114

    申请日:2008-01-23

    IPC分类号: G06F21/00 G06F15/18

    CPC分类号: H04L51/12

    摘要: Learning to, and detecting spam messages using a multi-stage combination of probability calculations based on individual and aggregate training sets of previously identified messages. During a preliminary phase, classifiers are trained, lower and upper limit probabilities, and a combined probability threshold are iteratively determined using a multi-stage combination of probability calculations based on minor and major subsets of messages previously categorized as valid or spam. During a live phase, a first stage classifier uses only a particular subset, and a second stage classifier uses a master set of previously categorized messages. If a newly received message can not be categorized with certainty by the first stage classifier, and a computed first stage probability is within the previously determined lower and upper limits, first and second stage probabilities are combined. If the combined probability is greater than the previously determined combined probability threshold, the received message is marked as spam.

    摘要翻译: 使用基于先前识别的消息的个体和聚合训练集的概率计算的多阶段组合来学习和检测垃圾邮件。 在初步阶段,分类器被训练,下限和上限概率,并且使用基于先前被分类为有效或垃圾的消息的次要和主要子集的概率计算的多阶段组合来迭代地确定组合概率阈值。 在实时阶段期间,第一阶段分类器仅使用特定子集,并且第二阶段分类器使用先前分类的消息的主集合。 如果新接收到的消息不能被第一级分类器确定地分类,并且计算出的第一级概率在先前确定的下限和上限之内,则组合第一和第二级概率。 如果组合概率大于先前确定的组合概率阈值,则所接收的消息被标记为垃圾邮件。

    Learning framework for online applications
    5.
    发明授权
    Learning framework for online applications 有权
    在线应用程序的学习框架

    公开(公告)号:US07996897B2

    公开(公告)日:2011-08-09

    申请号:US12011114

    申请日:2008-01-23

    IPC分类号: G06N7/02

    CPC分类号: H04L51/12

    摘要: Learning to, and detecting spam messages using a multi-stage combination of probability calculations based on individual and aggregate training sets of previously identified messages. During a preliminary phase, classifiers are trained, lower and upper limit probabilities, and a combined probability threshold are iteratively determined using a multi-stage combination of probability calculations based on minor and major subsets of messages previously categorized as valid or spam. During a live phase, a first stage classifier uses only a particular subset, and a second stage classifier uses a master set of previously categorized messages. If a newly received message can not be categorized with certainty by the first stage classifier, and a computed first stage probability is within the previously determined lower and upper limits, first and second stage probabilities are combined. If the combined probability is greater than the previously determined combined probability threshold, the received message is marked as spam.

    摘要翻译: 使用基于先前识别的消息的个体和聚合训练集的概率计算的多阶段组合来学习和检测垃圾邮件。 在初步阶段,分类器被训练,下限和上限概率,并且使用基于先前被分类为有效或垃圾的消息的次要和主要子集的概率计算的多阶段组合来迭代地确定组合概率阈值。 在实时阶段期间,第一阶段分类器仅使用特定子集,并且第二阶段分类器使用先前分类的消息的主集合。 如果新接收到的消息不能被第一级分类器确定地分类,并且计算出的第一级概率在先前确定的下限和上限之内,则组合第一和第二级概率。 如果组合概率大于先前确定的组合概率阈值,则所接收的消息被标记为垃圾邮件。