Keyword extraction apparatus, keyword extraction method, and computer readable recording medium storing keyword extraction program
    3.
    发明授权
    Keyword extraction apparatus, keyword extraction method, and computer readable recording medium storing keyword extraction program 失效
    关键词提取装置,关键词提取方法以及存储关键词提取程序的计算机可读记录介质

    公开(公告)号:US06173251B2

    公开(公告)日:2001-01-09

    申请号:US09123809

    申请日:1998-07-28

    IPC分类号: G06F1730

    CPC分类号: G06F17/3061 Y10S707/99933

    摘要: Disclosed is a keyword extraction apparatus and method capable of overcoming a problem in the conventional automatic keyword extraction wherein character strings in a sentence to be processed are employed, as they are, to assign a document with an index in terms of keywords; hence words having the similar meaning but different expressions in written language cannot be retrieved. The keyword extraction apparatus comprises technical term storage means for storing technical terms with proper expressions and different expressions thereof, and basic word storage means for storing general basic words of high frequency. Technical-term segmentation point setting means cuts out a range of any of the technical terms stored in technical term storage means from an input sentence. When the cut-out technical term is written in a different expression, the different expression is replaced by a corresponding proper expression in proper expression replacing means. Character-type segmentation point setting means detects a difference in character type in the input sentence. Basic-word segmentation point setting means cuts out, from the input sentence, a range of any of the basic words stored in the basic word storage means. Partial character string cutting means cuts out, as keywords, all relevant partial character strings based on segmentation points set by the technical-term segmentation point setting means, the character-type segmentation point setting means and the basic-word segmentation point setting means.

    摘要翻译: 公开了一种能克服常规自动关键字提取中的问题的关键词提取装置和方法,其中采用要处理的句子中的字符串,原样按照关键字分配具有索引的文档; 因此,书面语言中具有相似含义但不同表达形式的单词无法检索。 关键词提取装置包括技术术语存储装置,用于存储具有适当表达和不同表达形式的技术术语,以及用于存储高频一般基本词的基本字存储装置。 技术术语分割点设置意味着从输入句中切出存储在技术术语存储装置中的任何技术术语的范围。 当切出的技术术语用不同的表达形式写入时,不同的表达式被适当表达式替换手段中的相应适当表达所替代。 字符型分割点设定装置检测输入句子中的字符类型的差异。 基本词分割点设定装置从输入句中切出存储在基本词存储装置中的任何一个基本词的范围。 部分字符串切割装置根据由技术术语分割点设定装置,字符型分割点设定装置和基本词分割点设定装置设定的分割点,切出作为关键字的所有相关部分字符串。