Weighting method for use in information extraction and abstracting, based on the frequency of occurrence of keywords and similarity calculations

发明授权

US06240378B1 Weighting method for use in information extraction and abstracting, based on the frequency of occurrence of keywords and similarity calculations 失效

标题翻译：基于关键词发生频率和相似度计算的信息提取和抽象中使用的加权方法

请登陆查看更多内容

专利标题： Weighting method for use in information extraction and abstracting, based on the frequency of occurrence of keywords and similarity calculations
专利标题（中）： 基于关键词发生频率和相似度计算的信息提取和抽象中使用的加权方法
申请号： US09211385

申请日： 1998-12-14
公开(公告)号： US06240378B1

公开(公告)日： 2001-05-29
发明人: Takeshi Imanaka , Mitsuteru Kataoka , Satoshi Matsuura
申请人： Takeshi Imanaka , Mitsuteru Kataoka , Satoshi Matsuura
优先权： JP6-285718 19941118; JP7-66340 19950324; JP7-253981 19950929
主分类号： G06F1727
IPC分类号： G06F1727

Weighting method for use in information extraction and abstracting, based on the frequency of occurrence of keywords and similarity calculations

摘要：

An information abstracting method and apparatus for extracting and displaying keywords as an information abstract. Given a large number of character string data sets divided into prescribed units, the extracted keywords are significant and effective in describing a topic common to the plurality of units. The information abstracting apparatus comprises an input section for accepting an input of character string data divided into prescribed units, with each individual character represented by a character code, and an output section for displaying the result of information abstracting. Keywords contained in each of the prescribed units are extracted by a keyword extracting section from the character string input data from the input section. A score is calculated for each keyword by a score calculating section, so that a higher score is given to a keyword extracted from a larger number of units. On the basis of the calculated scores, keywords are selected by an abstracting section and are outputted as an information abstract by the output section.

摘要（中）：

一种用于提取和显示关键字作为信息摘要的信息抽象方法和装置。给定分割为规定单位的大量字符串数据集，所提取的关键词在描述多个单元共同的主题时是显着且有效的。信息提取装置包括：输入部分，用于接受以字符代码表示的每个单独字符划分为规定单位的字符串数据的输入;以及用于显示信息抽象结果的输出部分。包含在每个规定单元中的关键字由关键词提取部分从输入部分的字符串输入数据提取。通过分数计算部分为每个关键词计算分数，使得从更大数量的单位提取的关键词给出较高的分数。在计算出的分数的基础上，通过抽象部分选择关键词，并作为输出部分的信息摘要输出。

信息查询

Espacenet