Classification of documents
    1.
    发明授权
    Classification of documents 有权
    文件分类

    公开(公告)号:US08805840B1

    公开(公告)日:2014-08-12

    申请号:US12772166

    申请日:2010-04-30

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30864 G06F17/30707

    摘要: Some embodiments provide a method for evaluating a content segment for relevancy to several of categories. The method retrieves the content segment. For each of the several categories, the method determines the relevancy of the content segment to the category by using a scoring model for the category. The scoring model accounts for (i) the presence of key word sets in the content segment and (ii) the context of the key word sets in the content segment. For each of the several categories, the method tags the content segment when the content segment is determined as relevant to the category.

    摘要翻译: 一些实施例提供了用于评估与几个类别相关的内容段的方法。 该方法检索内容段。 对于几个类别中的每一个,该方法通过使用该类别的评分模型来确定内容段与类别的相关性。 评分模型说明(i)内容片段中关键词集的存在,以及(ii)内容片段中关键词集的上下文。 对于几个类别中的每一个,当内容片段被确定为与该类别相关时,该方法标记内容片段。

    Models for classifying documents
    2.
    发明授权

    公开(公告)号:US09760634B1

    公开(公告)日:2017-09-12

    申请号:US12772168

    申请日:2010-04-30

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30864 G06F17/30707

    摘要: Some embodiments provide a method for defining a content relevance model for determining whether a content segment is relevant to a particular category. The method receives a first set of content segments that contain content relevant to the particular category and a second set of content segments that contain content not relevant to the particular category. The method identifies a set of key word sets more likely to appear in the first set of content segments than the second set of content segments. The method defines a content relevance model that comprises a set of groups of word sets and a score for each group, each of the groups of word sets comprising a key word set from the set of key word sets and at least one word set found in a context of the key word set in at least one of the received content segments.