Method for extracting name entities and jargon terms using a suffix tree data structure
    3.
    发明授权
    Method for extracting name entities and jargon terms using a suffix tree data structure 有权
    使用后缀树数据结构提取名称实体和术语术语的方法

    公开(公告)号:US07197449B2

    公开(公告)日:2007-03-27

    申请号:US10017408

    申请日:2001-10-30

    IPC分类号: G06F17/20 G06F17/21 G06F17/27

    CPC分类号: G06F17/2755 G06F17/278

    摘要: A method for entity name and jargon term recognition and extraction. An embodiment of the present invention uses a suffix tree data structure to determine frequently occurring phrases. In one embodiment text to be analyzed is preprocessed. The text is then separated into clauses and a suffix tree is created for the text. The suffix tree is used to determine repetitious segments. Unrecognized text fragment, occurring with a high frequency, have a comparably high probability of being a name entity or jargon term. The set of repetitious segments is then filtered to obtain a set of possible entity names and jargon terms.

    摘要翻译: 实体名称和术语识别和提取的方法。 本发明的实施例使用后缀树数据结构来确定频繁出现的短语。 在一个实施例中,待分析的文本被预处理。 然后将文本分隔成子句,并为文本创建一个后缀树。 后缀树用于确定重复的段。 以高频率出现的无法识别的文本片段的名称实体或术语术语的概率相对较高。 然后对该组重复的段进行过滤以获得一组可能的实体名称和术语。