Classification using a cascade approach
    3.
    发明授权
    Classification using a cascade approach 失效
    使用级联方法分类

    公开(公告)号:US07693806B2

    公开(公告)日:2010-04-06

    申请号:US11766434

    申请日:2007-06-21

    IPC分类号: G06F15/18 G06N3/08

    摘要: A system and method that facilitates and effectuates optimizing a classifier for greater performance in a specific region of classification that is of interest, such as a low false positive rate or a low false negative rate. A two-stage classification model can be trained and employed, where the first stage classification is optimized over the entire classification region and the second stage classifier is optimized for the specific region of interest. During training the entire set of training data is employed by a first stage classifier. Only data that is classified by the first stage classifier or by cross validation to fall within a region of interest is used to train the second stage classifier. During classification, data that is classified within the region of interest by the first classification is given the first stage classifier's classification value, otherwise the classification value for the instance of data from the second stage classifier is used.

    摘要翻译: 促进并实现分类器在特定感兴趣区域中的更高性能的系统和方法,例如低假阳性率或低假阴性率。 可以训练和采用两阶段分类模型,其中对整个分类区域优化第一阶段分类,并针对特定的兴趣区域优化第二阶段分类器。 在训练期间,整套训练数据由第一阶段分类器采用。 仅使用由第一阶段分类器分类的数据或通过交叉验证落入感兴趣区域内的数据来训练第二阶段分类器。 在分类期间,通过第一分类对分类在感兴趣区域内的数据给予第一阶段分类器的分类值,否则使用来自第二阶段分类器的数据实例的分类值。

    Web document keyword and phrase extraction
    4.
    发明授权
    Web document keyword and phrase extraction 有权
    Web文档关键字和短语提取

    公开(公告)号:US08135728B2

    公开(公告)日:2012-03-13

    申请号:US11619230

    申请日:2007-01-03

    IPC分类号: G06F7/00 G06F17/30 G06F13/14

    摘要: Extraction analysis techniques biased, in part, by query frequency information from a query log file and/or search engine cache are employed along with machine learning processes to determine candidate keywords and/or phrases of web documents. Web oriented features associated with the candidate keywords and/or phrases are also utilized to analyze the web documents. A keyword and/or phrase extraction mechanism can be utilized to score keywords and/or phrases in a web document and estimate a likelihood that the keywords and/or phrases are relevant, for example, in an advertising system and the like.

    摘要翻译: 提取分析技术部分地通过来自查询日志文件和/或搜索引擎高速缓冲存储器的查询频率信息以及机器学习过程来偏移来确定web文档的候选关键字和/或短语。 与候选关键字和/或短语相关联的面向Web的功能也用于分析网络文档。 可以使用关键字和/或短语提取机制来评估网络文档中的关键字和/或短语,并估计关键词和/或短语相关的可能性,例如在广告系统等中。

    Using IP address and domain for email spam filtering
    5.
    发明授权
    Using IP address and domain for email spam filtering 有权
    使用IP地址和域进行垃圾邮件过滤

    公开(公告)号:US07689652B2

    公开(公告)日:2010-03-30

    申请号:US11031672

    申请日:2005-01-07

    IPC分类号: G06F15/16 G06F15/173

    摘要: Email spam filtering is performed based on a combination of IP address and domain. When an email message is received, an IP address and a domain associated with the email message are determined. A cross product of the IP address (or portions of the IP address) and the domain (or portions of the domain) is calculated. If the email message is known to be either spam or non-spam, then a spam score based on the known spam status is stored in association with each (IP address, domain) pair element of the cross product. If the spam status of the email message is not known, then the (IP address, domain) pair elements of the cross product are used to lookup previously determined spam scores. A combination of the previously determined spam scores is used to determine whether or not to treat the received email message as spam.

    摘要翻译: 电子邮件垃圾邮件过滤是基于IP地址和域名的组合来执行的。 当接收到电子邮件消息时,确定与电子邮件消息相关联的IP地址和域。 计算IP地址(或IP地址的部分)和域(或域的部分)的交叉乘积。 如果电子邮件消息被称为垃圾邮件或非垃圾邮件,则根据已知垃圾邮件状态的垃圾邮件分数与交叉产品的每个(IP地址,域)对元素相关联地存储。 如果电子邮件的垃圾邮件状态未知,则交叉产品的(IP地址,域)对元素将用于查找先前确定的垃圾邮件分数。 使用先前确定的垃圾邮件分数的组合来确定是否将接收的电子邮件消息视为垃圾邮件。

    CLASSIFICATION USING A CASCADE APPROACH
    6.
    发明申请
    CLASSIFICATION USING A CASCADE APPROACH 失效
    使用CASCADE方法进行分类

    公开(公告)号:US20080319932A1

    公开(公告)日:2008-12-25

    申请号:US11766434

    申请日:2007-06-21

    IPC分类号: G06F15/18

    摘要: A system and method that facilitates and effectuates optimizing a classifier for greater performance in a specific region of classification that is of interest, such as a low false positive rate or a low false negative rate. A two-stage classification model can be trained and employed, where the first stage classification is optimized over the entire classification region and the second stage classifier is optimized for the specific region of interest. During training the entire set of training data is employed by a first stage classifier. Only data that is classified by the first stage classifier or by cross validation to fall within a region of interest is used to train the second stage classifier. During classification, data that is classified within the region of interest by the first classification is given the first stage classifier's classification value, otherwise the classification value for the instance of data from the second stage classifier is used.

    摘要翻译: 促进并实现分类器在特定感兴趣区域中的更高性能的系统和方法,例如低假阳性率或低假阴性率。 可以训练和采用两阶段分类模型,其中对整个分类区域优化第一阶段分类,并针对特定的兴趣区域优化第二阶段分类器。 在训练期间,整套训练数据由第一阶段分类器采用。 仅使用由第一阶段分类器分类的数据或通过交叉验证落入感兴趣区域内的数据来训练第二阶段分类器。 在分类期间,通过第一分类对分类在感兴趣区域内的数据给予第一阶段分类器的分类值,否则使用来自第二阶段分类器的数据实例的分类值。

    Training filters for detecting spasm based on IP addresses and text-related features
    7.
    发明授权
    Training filters for detecting spasm based on IP addresses and text-related features 有权
    培训过滤器,用于根据IP地址和文本相关功能检测痉挛

    公开(公告)号:US07464264B2

    公开(公告)日:2008-12-09

    申请号:US10809163

    申请日:2004-03-25

    IPC分类号: H04L9/00 G06F21/00

    CPC分类号: H04L51/12 G06Q10/107

    摘要: The subject invention provides for an intelligent quarantining system and method that facilitates detecting and preventing spam. In particular, the invention employs a machine learning filter specifically trained using origination features such as an IP address as well as destination feature such as a URL. Moreover, the system and method involve training a plurality of filters using specific feature data for each filter. The filters are trained independently each other, thus one feature may not unduly influence another feature in determining whether a message is spam. Because multiple filters are trained and available to scan messages either individually or in combination (at least two filters), the filtering or spam detection process can be generalized to new messages having slightly modified features (e.g., IP address). The invention also involves locating the appropriate IP addresses or URLs in a message as well as guiding filters to weigh origination or destination features more than text-based features.

    摘要翻译: 本发明提供了一种便于检测和防止垃圾邮件的智能隔离系统和方法。 特别地,本发明采用使用诸如IP地址之类的发起特征以及目的地特征(例如URL)专门训练的机器学习滤波器。 此外,该系统和方法涉及使用针对每个滤波器的特定特征数据来训练多个滤波器。 滤波器被彼此独立地训练,因此在确定消息是否是垃圾邮件时,一个特征可能不会不适当地影响另一特征。 由于多个过滤器被训练并可用于单独或组合扫描消息(至少两个过滤器),因此过滤或垃圾邮件检测过程可以推广到具有稍微修改的特征(例如,IP地址)的新消息。 本发明还涉及在消息中定位适当的IP地址或URL,以及引导过滤器比基于文本的特征更重要的起始或目的地特征。

    Storage abuse prevention
    10.
    发明授权
    Storage abuse prevention 有权
    存储滥用预防

    公开(公告)号:US07848501B2

    公开(公告)日:2010-12-07

    申请号:US11042245

    申请日:2005-01-25

    IPC分类号: H04M3/42 G06F15/16

    摘要: The subject invention provides a unique system and method that facilitates mitigation of storage abuse in connection with free storage provided by messaging service providers such as email, instant messaging, chat, blogging, and/or web hosting service providers. The system and method involve measuring the outbound volume of stored data. When the volume satisfies a threshold, a cost can be imposed on the account to mitigate the suspicious or abusive activity. Other factors can be considered as well that can modify the cost imposed on the cost such as by increasing the cost. Machine learning can be employed as well to predict a level or degree of suspicion. The various factors or the text of the messages can be used as input for the machine learning system.

    摘要翻译: 本发明提供了一种独特的系统和方法,其有助于缓解由诸如电子邮件,即时消息,聊天,博客和/或网络托管服务提供商之类的消息传递服务提供商提供的免费存储的存储滥用。 系统和方法涉及测量存储数据的出站量。 当卷满足阈值时,可以对该帐户施加成本以减轻可疑或滥用活动。 也可以考虑其他因素,从而可以通过增加成本来改变对成本的成本。 也可以使用机器学习来预测一定程度的怀疑。 消息的各种因素或文本可以用作机器学习系统的输入。