Text mining method and apparatus allowing a user to analyze contents of a document set from plural analysis axes
    2.
    发明授权
    Text mining method and apparatus allowing a user to analyze contents of a document set from plural analysis axes 失效
    允许用户从多个分析轴分析文档集的内容的文本挖掘方法和装置

    公开(公告)号:US06757676B1

    公开(公告)日:2004-06-29

    申请号:US09649961

    申请日:2000-08-29

    IPC分类号: G06F1730

    摘要: A text mining method whereby documents (texts) can be analyzed from a wide variety of visual points. The text mining method includes: distinctive word and/or phrase extraction step of extracting words and/or phrases characteristically emerging in a processing subject document set obtained by taking out whole or a part of a set of documents registered beforehand; definition information setting step of setting definition information including a specified word or phrase or specified bibliography information; coincident word and/or phrase acquisition step of acquiring coincident words and/or phrases coincident in a predetermined range with a word or phrase or bibliography information included in said definition information from among words and/or phrases extracted at said distinctive word and/or phrase extraction step; and multiplex coincident word and/or phrase acquisition step of acquiring coincident words and/or phrases coincident in a predetermined range with an individual word or phrase or bibliography information acquired from each of a plurality of different definition information pieces.

    摘要翻译: 一种文本挖掘方法,可以从各种视觉点分析文档(文本)。 文本挖掘方法包括:提取通过取出预先登记的一组文档的全部或一部分而获得的处理对象文档集中特征出现的单词和/或短语的特征词和/或短语提取步骤; 定义信息设置步骤,设置包括指定的单词或短语或指定参考书目信息的定义信息; 一致的单词和/或短语获取步骤,用于在预定范围内与在所述特征词和/或短语中提取的单词和/或短语中包含的所述定义信息中包含的单词或短语或参考书目信息获取一致的单词和/或短语 提取步骤 以及将从预定范围重合的一致字和/或短语与从多个不同定义信息片段中的每一个获取的单个词或短语或参考书目信息进行多路复用的一致词和/或短语获取步骤。

    Data display method and apparatus for use in text mining
    4.
    发明授权
    Data display method and apparatus for use in text mining 失效
    用于文本挖掘的数据显示方法和装置

    公开(公告)号:US06738786B2

    公开(公告)日:2004-05-18

    申请号:US09874005

    申请日:2001-06-06

    IPC分类号: G06F1730

    摘要: In a text mining technique, if the system only extracts characteristic words and phrases frequently cooccurring with the respective components of an analysis axis as an analysis condition, similar words and phrases are extracted for any component. To clearly indicate existence of characteristic words and phrases which do not appear as cooccurrence words and phrases for other components of the analysis axis, it is desired to appropriately present distinguishable features between the components to the user. For this purpose, the frequency of appearances of a plurality of characteristic words and phrases in a document satisfying each analysis condition is calculated. As a result, multiple cooccurrence words and phrases and component-cooccurrence words and phrases are discriminatively displayed. It is therefore possible for the user to appropriately analyze the contents of a plurality of documents.

    摘要翻译: 在文本挖掘技术中,如果系统只提取经常与分析轴的各个分量共同出现的特征词和短语作为分析条件,则为任何分量提取类似的词和短语。 为了清楚地表示存在不是作为分析轴的其他部件的共同文字和短语的特征词和短语,希望适当地向用户呈现组件之间的可区分的特征。 为此,计算满足各分析条件的文件中的多个特征词和短语的出现次数。 结果,多个同时出现的单词和短语以及组合 - 共同文字和短语被歧视地显示出来。 因此,用户可以适当地分析多个文档的内容。

    Document retrieval method and system and computer readable storage medium
    5.
    发明授权
    Document retrieval method and system and computer readable storage medium 失效
    文件检索方法和系统以及计算机可读存储介质

    公开(公告)号:US06665668B1

    公开(公告)日:2003-12-16

    申请号:US09645561

    申请日:2000-08-24

    IPC分类号: G06F1730

    摘要: A document retrieval system is provided which has a document display interface which is easy to recognize the important portions even if a document retrieved by using a query expression designated by a document or a long sentence is displayed. When a text is registered, predetermined character strings and location information which are extracted from the text are stored in a location information file. A weight of each character string is calculated by a predetermined method and is stored in a weight file. In retrieving a document, predetermined character strings are extracted from a designated query expression. A similarity is calculated between the query expression and texts in the database by using the location information and the weights acquired from the location file and the weight file. In displaying the document, character strings having the high weights are extracted from the character strings used for the retrieval. Then, the display format of a portion which contains the extracted character strings is changed to display the text.

    摘要翻译: 提供了一种具有文档显示界面的文档检索系统,即使通过使用由文档或长句子指定的查询表达式检索到的文档被显示,也容易识别重要部分。 当登记文本时,将从文本中提取的预定字符串和位置信息存储在位置信息文件中。 每个字符串的权重通过预定方法计算并存储在权重文件中。 在检索文档时,从指定的查询表达式中提取预定的字符串。 通过使用位置信息和从位置文件和权重文件获得的权重,在查询表达式和数据库中的文本之间计算相似度。 在显示文档时,从用于检索的字符串中提取具有高权重的字符串。 然后,改变包含提取的字符串的部分的显示格式以显示文本。