KEYWORDS EXTRACTION AND ENRICHMENT VIA CATEGORIZATION SYSTEMS
    1.
    发明申请
    KEYWORDS EXTRACTION AND ENRICHMENT VIA CATEGORIZATION SYSTEMS 有权
    关键词通过分类系统提取和丰富

    公开(公告)号:US20120166441A1

    公开(公告)日:2012-06-28

    申请号:US12978169

    申请日:2010-12-23

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3071

    摘要: Techniques for determining a set of keywords associated with a document are provided. A document is received that may be classified into a taxonomy that includes a plurality of categories. A categorization ranking is determined for each category for the received document. A set of categories of the taxonomy having highest categorization rankings is determined for the received document. Documents representing the set of categories having highest categorization rankings are combined together into a cumulative representative text that includes a plurality of terms. A cumulative term corpus importance score is determined for each term in the cumulative representative text. The cumulative term corpus importance score for a particular term indicates an importance of the particular term in a context of the cumulative representative text. A set of terms of the cumulative representative text having highest cumulative term corpus importance scores is selected to be keywords for the received document.

    摘要翻译: 提供了用于确定与文档相关联的一组关键词的技术。 收到可被分类为包括多个类别的分类法的文档。 为接收到的文档的每个类别确定分类排名。 对于接收到的文档确定具有最高分类排名的分类的一组类别。 表示具有最高分类排名的类别集合的文档被组合成包括多个项的累积代表性文本。 累积代表性文本中的每个术语确定累积项目语料库重要性分数。 特定术语的累积术语语料库重要性分数表示特定术语在累积代表性文本的上下文中的重要性。 选择具有最高累积项语料库重要性分数的累积代表性文本的一组术语作为接收到的文档的关键字。

    Keywords extraction and enrichment via categorization systems
    2.
    发明授权
    Keywords extraction and enrichment via categorization systems 有权
    关键词通过分类系统提取和浓缩

    公开(公告)号:US09342590B2

    公开(公告)日:2016-05-17

    申请号:US12978169

    申请日:2010-12-23

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3071

    摘要: Techniques for determining a set of keywords associated with a document are provided. A document is received that may be classified into a taxonomy that includes a plurality of categories. A categorization ranking is determined for each category for the received document. A set of categories of the taxonomy having highest categorization rankings is determined for the received document. Documents representing the set of categories having highest categorization rankings are combined together into a cumulative representative text that includes a plurality of terms. A cumulative term corpus importance score is determined for each term in the cumulative representative text. The cumulative term corpus importance score for a particular term indicates an importance of the particular term in a context of the cumulative representative text. A set of terms of the cumulative representative text having highest cumulative term corpus importance scores is selected to be keywords for the received document.

    摘要翻译: 提供了用于确定与文档相关联的一组关键词的技术。 收到可被分类为包括多个类别的分类法的文档。 为接收到的文档的每个类别确定分类排名。 对于接收到的文档确定具有最高分类排名的分类的一组类别。 表示具有最高分类排名的类别集合的文档被组合成包括多个项的累积代表性文本。 累积代表性文本中的每个术语确定累积项目语料库重要性分数。 特定术语的累积术语语料库重要性分数表示特定术语在累积代表性文本的上下文中的重要性。 选择具有最高累积项语料库重要性分数的累积代表性文本的一组术语作为接收到的文档的关键字。