Keyword extraction apparatus for Japanese texts
    1.
    发明授权
    Keyword extraction apparatus for Japanese texts 失效
    日语文本的关键字提取装置

    公开(公告)号:US5619410A

    公开(公告)日:1997-04-08

    申请号:US219530

    申请日:1994-03-29

    IPC分类号: G06F17/27 G06F17/30 G06F17/20

    摘要: Sentence segmentation means performing sentence segmentation on the Japanese text data to be processed. Morpheme analysis means divides sentence-by-sentence data into morphemes and analyzes the resultant morphemes on the basis of information regarding morpheme-by-morpheme continuation contained in an analytical dictionary. Morpheme dictionary information development means develops the contents of the morpheme dictionary including part of speech information, semantic classification information, sentence pattern information and noted term information. Keyword candidate extraction means extracts keyword candidates from sentence-by-sentence data on the basis of the part of speech information and the like of each morpheme. Case information acquisition means acquires case information from information regarding the classes of case of keyword candidates immediately preceding noted terms stored in a noted term table and case class classification information for stored in a case class conversion table. Frequency information acquisition means acquires the appearance frequency of each keyword candidate. Importance calculation means calculates the importance of each keyword candidate as keyword. Keyword finalizing means definitely determines as true keywords only those keyword candidates having degrees of importance above a designated level of importance.

    摘要翻译: 句子分割意味着对待处理的日文文本数据执行句子分割。 语素分析意味着将逐句数据分解为语素,并根据分析词典中包含的语素语素延续信息分析结果语素。 语素字典信息开发意味着开发词素词典的内容,包括语音信息,语义分类信息,句型信息和注释术语信息。 关键字候选提取方法基于每个语素的语音信息等,从逐句数据中提取关键字候选。 情况信息获取装置从紧接在所述术语表中存储的所述术语之前的关键词候选的情况类别的信息和用于存储在病例分类转换表中的病例类别分类信息中获取病例信息。 频率信息获取装置获取每个关键字候选的出现频率。 重要性计算手段计算每个关键字候选人的关键字的重要性。 关键词最终确定手段绝对将确定为真正关键字的那些关键词候选人的重要度高于指定的重要程度。