-
公开(公告)号:US20090187987A1
公开(公告)日:2009-07-23
申请号:US12011114
申请日:2008-01-23
Applicant: Vishwanth Tumkur Ramarao , Abhishek Kumar Pandey , Raghav Jeyaraman
Inventor: Vishwanth Tumkur Ramarao , Abhishek Kumar Pandey , Raghav Jeyaraman
CPC classification number: H04L51/12
Abstract: Learning to, and detecting spam messages using a multi-stage combination of probability calculations based on individual and aggregate training sets of previously identified messages. During a preliminary phase, classifiers are trained, lower and upper limit probabilities, and a combined probability threshold are iteratively determined using a multi-stage combination of probability calculations based on minor and major subsets of messages previously categorized as valid or spam. During a live phase, a first stage classifier uses only a particular subset, and a second stage classifier uses a master set of previously categorized messages. If a newly received message can not be categorized with certainty by the first stage classifier, and a computed first stage probability is within the previously determined lower and upper limits, first and second stage probabilities are combined. If the combined probability is greater than the previously determined combined probability threshold, the received message is marked as spam.
Abstract translation: 使用基于先前识别的消息的个体和聚合训练集的概率计算的多阶段组合来学习和检测垃圾邮件。 在初步阶段,分类器被训练,下限和上限概率,并且使用基于先前被分类为有效或垃圾的消息的次要和主要子集的概率计算的多阶段组合来迭代地确定组合概率阈值。 在实时阶段期间,第一阶段分类器仅使用特定子集,并且第二阶段分类器使用先前分类的消息的主集合。 如果新接收到的消息不能被第一级分类器确定地分类,并且计算出的第一级概率在先前确定的下限和上限之内,则组合第一和第二级概率。 如果组合概率大于先前确定的组合概率阈值,则所接收的消息被标记为垃圾邮件。
-
公开(公告)号:US07996897B2
公开(公告)日:2011-08-09
申请号:US12011114
申请日:2008-01-23
Applicant: Vishwanth Tumkur Ramarao , Abhishek Kumar Pandey , Raghav Jeyaraman
Inventor: Vishwanth Tumkur Ramarao , Abhishek Kumar Pandey , Raghav Jeyaraman
IPC: G06N7/02
CPC classification number: H04L51/12
Abstract: Learning to, and detecting spam messages using a multi-stage combination of probability calculations based on individual and aggregate training sets of previously identified messages. During a preliminary phase, classifiers are trained, lower and upper limit probabilities, and a combined probability threshold are iteratively determined using a multi-stage combination of probability calculations based on minor and major subsets of messages previously categorized as valid or spam. During a live phase, a first stage classifier uses only a particular subset, and a second stage classifier uses a master set of previously categorized messages. If a newly received message can not be categorized with certainty by the first stage classifier, and a computed first stage probability is within the previously determined lower and upper limits, first and second stage probabilities are combined. If the combined probability is greater than the previously determined combined probability threshold, the received message is marked as spam.
Abstract translation: 使用基于先前识别的消息的个体和聚合训练集的概率计算的多阶段组合来学习和检测垃圾邮件。 在初步阶段,分类器被训练,下限和上限概率,并且使用基于先前被分类为有效或垃圾的消息的次要和主要子集的概率计算的多阶段组合来迭代地确定组合概率阈值。 在实时阶段期间,第一阶段分类器仅使用特定子集,并且第二阶段分类器使用先前分类的消息的主集合。 如果新接收到的消息不能被第一级分类器确定地分类,并且计算出的第一级概率在先前确定的下限和上限之内,则组合第一和第二级概率。 如果组合概率大于先前确定的组合概率阈值,则所接收的消息被标记为垃圾邮件。
-