METHODS AND SYSTEMS FOR LANGUAGE-AGNOSTIC MACHINE LEARNING IN NATURAL LANGUAGE PROCESSING USING FEATURE EXTRACTION
    2.
    发明申请
    METHODS AND SYSTEMS FOR LANGUAGE-AGNOSTIC MACHINE LEARNING IN NATURAL LANGUAGE PROCESSING USING FEATURE EXTRACTION 审中-公开
    使用特征提取的自然语言处理中语言学习机器学习的方法与系统

    公开(公告)号:US20160162467A1

    公开(公告)日:2016-06-09

    申请号:US14964525

    申请日:2015-12-09

    IPC分类号: G06F17/27

    摘要: Methods, apparatuses, and systems are presented for generating natural language models using a novel system architecture for feature extraction. A method for extracting features for natural language processing comprises: accessing one or more tokens generated from a document to be processed; receiving one or more feature types defined by user; receiving selection of one or more feature types from a plurality of system-defined and user-defined feature types, wherein each feature type comprises one or more rules for generating features; receiving one or more parameters for the selected feature types, wherein the one or more rules for generating features are defined at least in part by the parameters; generating features associated with the document to be processed based on the selected feature types and the received parameters; and outputting the generated features in a format common among all feature types.

    摘要翻译: 提出了使用用于特征提取的新型系统架构来生成自然语言模型的方法,装置和系统。 一种用于提取自然语言处理特征的方法,包括:访问从要处理的文档生成的一个或多个令牌; 接收用户定义的一个或多个特征类型; 从多个系统定义和用户定义的特征类型中接收对一个或多个特征类型的选择,其中每个特征类型包括用于生成特征的一个或多个规则; 为所选择的特征类型接收一个或多个参数,其中用于生成特征的所述一个或多个规则至少部分地由所述参数定义; 基于所选择的特征类型和接收到的参数来生成与要处理的文档相关联的特征; 并以所有特征类型中共同的格式输出生成的特征。

    INTELLIGENT SYSTEM THAT DYNAMICALLY IMPROVES ITS KNOWLEDGE AND CODE-BASE FOR NATURAL LANGUAGE UNDERSTANDING
    7.
    发明申请
    INTELLIGENT SYSTEM THAT DYNAMICALLY IMPROVES ITS KNOWLEDGE AND CODE-BASE FOR NATURAL LANGUAGE UNDERSTANDING 有权
    智能系统动态改进自然语言理解知识和代码

    公开(公告)号:US20160162466A1

    公开(公告)日:2016-06-09

    申请号:US14964512

    申请日:2015-12-09

    IPC分类号: G06F17/27

    摘要: Systems, methods, and apparatuses are presented for a novel natural language tokenizer and tagger. In some embodiments, a method for tokenizing text for natural language processing comprises: generating from a pool of documents, a set of statistical models comprising one or more entries each indicating a likelihood of appearance of a character/letter sequence in the pool of documents; receiving a set of rules comprising rules that identify character/letter sequences as valid tokens; transforming one or more entries in the statistical models into new rules that are added to the set of rules when the entries indicate a high likelihood; receiving a document to be processed; dividing the document to be processed into tokens based on the set of statistical models and the set of rules, wherein the statistical models are applied where the rules fail to unambiguously tokenize the document; and outputting the divided tokens for natural language processing.

    摘要翻译: 系统,方法和设备被呈现给一种新颖的自然语言标记器和标签器。 在一些实施例中,用于对自然语言处理的文本进行标记化的方法包括:从文档池生成包括一个或多个条目的统计模型集合,每个条目表示在文档库中出现字符/字母序列的可能性; 接收一组包含将字符/字符序列识别为有效令牌的规则的规则; 将统计模型中的一个或多个条目转换为当条目表示高可能性时添加到规则集合中的新规则; 接收待处理的文件; 基于统计模型和规则集合将要处理的文档划分为令牌,其中在规则未能明确地标记文档的情况下应用统计模型; 并输出用于自然语言处理的分割令牌。