Sensitivity categorization of web pages
    12.
    发明授权
    Sensitivity categorization of web pages 有权
    网页敏感性分类

    公开(公告)号:US08589231B2

    公开(公告)日:2013-11-19

    申请号:US12696006

    申请日:2010-01-28

    IPC分类号: G06Q30/00

    CPC分类号: G06Q30/02 G06Q30/0277

    摘要: Methods, systems, and computer programs for categorizing the sensitivity of web pages are presented. In one method, a space of sensitive pages is identified based on the sensitivity categorization of a first plurality of web pages and a second plurality of web pages. The first plurality of web pages is obtained by performing search queries using known sensitive words, and the second plurality of web pages includes randomly selected web pages. Additionally, the method identifies a third plurality of web pages that includes web pages on or near the boundary between the space of sensitive pages and the space of non-sensitive pages. The space of sensitive pages is then redefined based on the sensitivity categorization of the first, second, and third pluralities of web pages. Once the space of sensitive pages is defined, the method is used to determine that a given web page is sensitive when the given web page is in the space of sensitive pages. Web pages are included in a marketing operation when the web pages are not sensitive.

    摘要翻译: 介绍了分类网页敏感度的方法,系统和计算机程序。 在一种方法中,基于第一多个网页和第二多个网页的灵敏度分类来识别敏感页面的空间。 通过使用已知敏感词执行搜索查询获得第一多个网页,并且第二多个网页包括随机选择的网页。 此外,该方法识别在敏感页面的空间和非敏感页面的空间之间的边界上或附近包括网页的第三多个网页。 然后,基于第一,第二和第三多个网页的灵敏度分类,重新定义敏感页面的空间。 一旦定义了敏感页面的空间,当给定的网页位于敏感页面的空间中时,该方法用于确定给定的网页是否敏感。 当网页不敏感时,网页被包含在营销操作中。

    SYSTEM FOR TRAINING CLASSIFIERS IN MULTIPLE CATEGORIES THROUGH ACTIVE LEARNING
    14.
    发明申请
    SYSTEM FOR TRAINING CLASSIFIERS IN MULTIPLE CATEGORIES THROUGH ACTIVE LEARNING 有权
    通过主动学习训练多个类别中的分类器的系统

    公开(公告)号:US20120095943A1

    公开(公告)日:2012-04-19

    申请号:US12905543

    申请日:2010-10-15

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005

    摘要: A system for training classifiers in multiple categories through an active learning system, including a computer having a memory and a processor, the processor programmed to: train an initial set of m binary one-versus-all classifiers, one for each category in a taxonomy, on a labeled dataset of examples stored in a database coupled with the computer; uniformly sample up to a predetermined large number of examples from a second, larger dataset of unlabeled examples stored in a database coupled with the computer; order the sampled unlabeled examples in order of informativeness for each classifier; determine a minimum subset of the unlabeled examples that are most informative for a maximum number of the classifiers to form an active set for learning; and use editorially-labeled versions of the examples of the active set to re-train the classifiers, thereby improving the accuracy of at least some of the classifiers.

    摘要翻译: 一种用于通过主动学习系统来训练分类器的系统,包括具有存储器和处理器的计算机,该处理器被编程为:训练一组初始的二进制一对全分类器,一个分类中的每个类别 在存储在与计算机耦合的数据库中的示例的标记数据集上; 从存储在与计算机耦合的数据库中的未标记示例的第二较大数据集中均匀地采样到预定的大量示例; 按照每个分类器的信息顺序对采样的未标记的示例进行排序; 确定对最大数量的分类器形成用于学习的活动集合的最有帮助的未标记示例的最小子集; 并使用编辑标签的版本的活动集的示例重新训练分类器,从而提高至少一些分类器的准确性。