Sensitivity Categorization of Web Pages
    1.
    发明申请
    Sensitivity Categorization of Web Pages 有权
    网页灵敏度分类

    公开(公告)号:US20110184817A1

    公开(公告)日:2011-07-28

    申请号:US12696006

    申请日:2010-01-28

    CPC分类号: G06Q30/02 G06Q30/0277

    摘要: Methods, systems, and computer programs for categorizing the sensitivity of web pages are presented. In one method, a space of sensitive pages is identified based on the sensitivity categorization of a first plurality of web pages and a second plurality of web pages. The first plurality of web pages is obtained by performing search queries using known sensitive words, and the second plurality of web pages includes randomly selected web pages. Additionally, the method identifies a third plurality of web pages that includes web pages on or near the boundary between the space of sensitive pages and the space of non-sensitive pages. The space of sensitive pages is then redefined based on the sensitivity categorization of the first, second, and third pluralities of web pages. Once the space of sensitive pages is defined, the method is used to determine that a given web page is sensitive when the given web page is in the space of sensitive pages. Web pages are included in a marketing operation when the web pages are not sensitive.

    摘要翻译: 介绍了分类网页敏感度的方法,系统和计算机程序。 在一种方法中,基于第一多个网页和第二多个网页的灵敏度分类来识别敏感页面的空间。 通过使用已知敏感词执行搜索查询获得第一多个网页,并且第二多个网页包括随机选择的网页。 此外,该方法识别在敏感页面的空间和非敏感页面的空间之间的边界上或附近包括网页的第三多个网页。 然后,基于第一,第二和第三多个网页的灵敏度分类,重新定义敏感页面的空间。 一旦定义了敏感页面的空间,当给定的网页位于敏感页面的空间中时,该方法用于确定给定的网页是否敏感。 当网页不敏感时,网页被包含在营销操作中。

    Annotating HTML segments with functional labels
    3.
    发明授权
    Annotating HTML segments with functional labels 有权
    用功能标签注释HTML细分

    公开(公告)号:US09594730B2

    公开(公告)日:2017-03-14

    申请号:US12829265

    申请日:2010-07-01

    IPC分类号: G06F17/00 G06F17/22

    CPC分类号: G06F17/2241

    摘要: A method and apparatus is described for assigning functional labels to segments of web pages in an application-independent way. In the approach described herein, one of a generic set functional labels are automatically assigned to each segment of a web page, where the generic functional labels may be topic-independent and application-independent. Applications with different needs can determine which segments of the web page to process based on which functional labels correspond to the types of information needed by each application. Thus, the work of classifying the function of each segment of a web page is separated from the work of selecting which segments satisfy the need of a particular application. The work of classification can be performed in an application-independent way, relieving the burden from every application developer from having to create their own classifiers.

    摘要翻译: 描述了以独立于应用的方式将功能标签分配给网页的段的方法和装置。 在本文描述的方法中,通用集合功能标签之一被自动分配给网页的每个片段,其中通用功能标签可以是主题独立的和与应用无关的。 具有不同需求的应用程序可以根据哪些功能标签对应于每个应用程序所需的信息类型来确定要处理的网页的哪些部分。 因此,将网页的每个段的功能分类的工作与选择哪些段满足特定应用的需要的工作分离。 分类工作可以独立于应用程序执行,减轻每个应用程序开发人员不必创建自己的分类器的负担。

    Sensitivity categorization of web pages
    4.
    发明授权
    Sensitivity categorization of web pages 有权
    网页敏感性分类

    公开(公告)号:US08589231B2

    公开(公告)日:2013-11-19

    申请号:US12696006

    申请日:2010-01-28

    IPC分类号: G06Q30/00

    CPC分类号: G06Q30/02 G06Q30/0277

    摘要: Methods, systems, and computer programs for categorizing the sensitivity of web pages are presented. In one method, a space of sensitive pages is identified based on the sensitivity categorization of a first plurality of web pages and a second plurality of web pages. The first plurality of web pages is obtained by performing search queries using known sensitive words, and the second plurality of web pages includes randomly selected web pages. Additionally, the method identifies a third plurality of web pages that includes web pages on or near the boundary between the space of sensitive pages and the space of non-sensitive pages. The space of sensitive pages is then redefined based on the sensitivity categorization of the first, second, and third pluralities of web pages. Once the space of sensitive pages is defined, the method is used to determine that a given web page is sensitive when the given web page is in the space of sensitive pages. Web pages are included in a marketing operation when the web pages are not sensitive.

    摘要翻译: 介绍了分类网页敏感度的方法,系统和计算机程序。 在一种方法中,基于第一多个网页和第二多个网页的灵敏度分类来识别敏感页面的空间。 通过使用已知敏感词执行搜索查询获得第一多个网页,并且第二多个网页包括随机选择的网页。 此外,该方法识别在敏感页面的空间和非敏感页面的空间之间的边界上或附近包括网页的第三多个网页。 然后,基于第一,第二和第三多个网页的灵敏度分类,重新定义敏感页面的空间。 一旦定义了敏感页面的空间,当给定的网页位于敏感页面的空间中时,该方法用于确定给定的网页是否敏感。 当网页不敏感时,网页被包含在营销操作中。

    Annotating HTML Segments With Functional Labels
    6.
    发明申请
    Annotating HTML Segments With Functional Labels 有权
    使用功能标签注释HTML细分

    公开(公告)号:US20120005686A1

    公开(公告)日:2012-01-05

    申请号:US12829265

    申请日:2010-07-01

    IPC分类号: G06F9/46

    CPC分类号: G06F17/2241

    摘要: A method and apparatus is described for assigning functional labels to segments of web pages in an application-independent way. In the approach described herein, one of a generic set functional labels are automatically assigned to each segment of a web page, where the generic functional labels may be topic-independent and application-independent. Applications with different needs can determine which segments of the web page to process based on which functional labels correspond to the types of information needed by each application. Thus, the work of classifying the function of each segment of a web page is separated from the work of selecting which segments satisfy the need of a particular application. The work of classification can be performed in an application-independent way, relieving the burden from every application developer from having to create their own classifiers.

    摘要翻译: 描述了以独立于应用的方式将功能标签分配给网页的段的方法和装置。 在本文描述的方法中,通用集合功能标签之一被自动分配给网页的每个片段,其中通用功能标签可以是主题独立的和与应用无关的。 具有不同需求的应用程序可以根据哪些功能标签对应于每个应用程序所需的信息类型来确定要处理的网页的哪些部分。 因此,将网页的每个段的功能分类的工作与选择哪些段满足特定应用的需要的工作分离。 分类工作可以独立于应用程序执行,减轻每个应用程序开发人员不必创建自己的分类器的负担。

    METHOD FOR EFFICIENTLY BUILDING COMPACT MODELS FOR LARGE MULTI-CLASS TEXT CLASSIFICATION
    7.
    发明申请
    METHOD FOR EFFICIENTLY BUILDING COMPACT MODELS FOR LARGE MULTI-CLASS TEXT CLASSIFICATION 审中-公开
    用于大型多类文本分类的高效建模方法

    公开(公告)号:US20090274376A1

    公开(公告)日:2009-11-05

    申请号:US12115486

    申请日:2008-05-05

    IPC分类号: G06K9/62

    CPC分类号: G06K9/6269 G06K9/00442

    摘要: A method of classifying documents includes: specifying multiple documents and classes, wherein each document includes a plurality of features and each document corresponds to one of the classes; determining reduced document vectors for the classes from the documents, wherein the reduced document vectors include features that satisfy threshold conditions corresponding to the classes; determining reduced weight vectors for relating the documents to the classes by comparing combinations of the reduced weight vectors and the reduced document vectors and separating the corresponding classes; and saving one or more values for the reduced weight vectors and the classes. Specific embodiments are directed to formulations for determining the reduced weight vectors including one-versus-rest classifiers, maximum entropy classifiers, and direct multiclass Support Vector Machines.

    摘要翻译: 分类文件的方法包括:指定多个文档和类,其中每个文档包括多个特征,并且每个文档对应于其中一个类; 从所述文档确定所述类的缩小的文档向量,其中所述缩小的文档向量包括满足与所述类别对应的阈值条件的特征; 通过比较缩小权重向量和简化文档向量的组合并分离相应的类别来确定用于将文档与类相关联的减小权重向量; 并为减小的权重向量和类别保存一个或多个值。 具体实施方案涉及用于确定减重权重向量的配方,包括一对休息分类器,最大熵分类器和直接多类支持向量机。