Automated dictionary creation for scientific terms

    公开(公告)号:US10176188B2

    公开(公告)日:2019-01-08

    申请号:US13752620

    申请日:2013-01-29

    IPC分类号: G06F17/30 G06F19/28 G06F17/27

    摘要: Systems and methods for automated creation of a dictionary of scientific terms are described herein. Initially, input data is filtered to obtain a primary file having a plurality of term-ID pairs with each term-ID pair having a unique term ID and a scientific term. Further, a remove-term file is generated based on one or more term-ID pairs identified from the primary file such that the scientific terms of each term-ID pair corresponds to one of additional terms, frequent scientific terms, and undesirable terms. At least one term-ID pair from among the one or more term-ID pairs is altered to obtain a modified term-ID pair based on modification rules. The modified term-ID pair is added to an add-term file and a modified file is obtained based on the remove-term file and the add-term file. Duplicate term-ID pairs present in the modified file are removed to obtain the dictionary of scientific terms.

    AUTOMATED DICTIONARY CREATION FOR SCIENTIFIC TERMS
    3.
    发明申请
    AUTOMATED DICTIONARY CREATION FOR SCIENTIFIC TERMS 审中-公开
    自动化科学词典创作

    公开(公告)号:US20130218849A1

    公开(公告)日:2013-08-22

    申请号:US13752620

    申请日:2013-01-29

    IPC分类号: G06F17/30

    摘要: Systems and methods for automated creation of a dictionary of scientific terms are described herein. Initially, input data is filtered to obtain a primary file having a plurality of term-ID pairs with each term-ID pair having a unique term ID and a scientific term. Further, a remove-term file is generated based on one or more term-ID pairs identified from the primary file such that the scientific terms of each term-ID pair corresponds to one of additional terms, frequent scientific terns, and undesirable terms. At least one term-ID pair from among the one or more term-ID pairs is altered to obtain a modified term-ID pair based on modification rules. The modified term-ID pair is added to an add-term file and a modified file is obtained based on the remove-term file and the add-term file. Duplicate term-ID pairs present in the modified file are removed to obtain the dictionary of scientific terms.

    摘要翻译: 本文描述了用于自动创建科学术语词典的系统和方法。 最初,输入数据被过滤以获得具有多个术语ID对的主文件,每个术语ID对具有唯一的术语ID和科学术语。 此外,基于从主文件识别的一个或多个术语ID对生成删除术语文件,使得每个术语-ID对的科学术语对应于附加术语之一,频繁的科学分类和不期望的术语。 一个或多个术语ID对中的至少一个术语ID对被改变以基于修改规则获得修改的术语ID对。 修改后的术语ID对被添加到添加项文件中,并且基于删除项文件和添加项文件获得修改的文件。 删除修改文件中存在的重复术语ID对,以获得科学术语的字典。