Rapid automatic keyword extraction for information retrieval and analysis
    1.
    发明授权
    Rapid automatic keyword extraction for information retrieval and analysis 有权
    快速自动关键词提取,用于信息检索和分析

    公开(公告)号:US08131735B2

    公开(公告)日:2012-03-06

    申请号:US12555916

    申请日:2009-09-09

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30616

    摘要: Methods and systems for rapid automatic keyword extraction for information retrieval and analysis. Embodiments can include parsing words in an individual document by delimiters, stop words, or both in order to identify candidate keywords. Word scores for each word within the candidate keywords are then calculated based on a function of co-occurrence degree, co-occurrence frequency, or both. Based on a function of the word scores for words within the candidate keyword, a keyword score is calculated for each of the candidate keywords. A portion of the candidate keywords are then extracted as keywords based, at least in part, on the candidate keywords having the highest keyword scores.

    摘要翻译: 快速自动关键词提取的信息检索和分析方法和系统。 实施例可以包括通过分隔符,停止词或两者来解析单个文档中的单词以识别候选关键字。 然后根据共同发生程度,共同发生频率或两者的函数计算候选关键字中每个单词的单词分数。 基于候选关键字中的单词的分数的函数,针对每个候选关键字计算关键词分数。 然后至少部分地基于具有最高关键词分数的候选关键词,将候选关键词的一部分提取为关键字。

    Automatic generation of stop word lists for information retrieval and analysis
    2.
    发明授权
    Automatic generation of stop word lists for information retrieval and analysis 有权
    自动生成用于信息检索和分析的停止词列表

    公开(公告)号:US08352469B2

    公开(公告)日:2013-01-08

    申请号:US12555962

    申请日:2009-09-09

    申请人: Stuart J Rose

    发明人: Stuart J Rose

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/277

    摘要: Methods and systems for automatically generating lists of stop words for information retrieval and analysis. Generation of the stop words can include providing a corpus of documents and a plurality of keywords. From the corpus of documents, a term list of all terms is constructed and both a keyword adjacency frequency and a keyword frequency are determined. If a ratio of the keyword adjacency frequency to the keyword frequency for a particular term on the term list is less than a predetermined value, then that term is excluded from the term list. The resulting term list is truncated based on predetermined criteria to form a stop word list.

    摘要翻译: 用于自动生成用于信息检索和分析的停止词列表的方法和系统。 生成停止词可以包括提供文档语料库和多个关键字。 从文档的语料库中构建所有术语的术语列表,并且确定关键字邻接频率和关键字频率。 如果术语列表中特定术语的关键字邻接频率与关键词频率的比率小于预定值,则该术语从术语列表中排除。 所得到的术语列表基于预定标准被截断以形成停止词列表。