Identification and Rejection of Meaningless Input During Natural Language Classification
    21.
    发明申请
    Identification and Rejection of Meaningless Input During Natural Language Classification 有权
    自然语言分类中无意义输入的识别与拒绝

    公开(公告)号:US20070244692A1

    公开(公告)日:2007-10-18

    申请号:US11279577

    申请日:2006-04-13

    CPC classification number: G06F17/2715 G06F17/30707 G10L15/183

    Abstract: A method for identifying data that is meaningless and generating a natural language statistical model which can reject meaningless input. The method can include identifying unigrams that are individually meaningless from a set of training data. At least a portion of the unigrams identified as being meaningless can be assigned to a first n-gram class. The method also can include identifying bigrams that are entirely composed of meaningless unigrams and determining whether the identified bigrams are individually meaningless. At least a portion of the bigrams identified as being individually meaningless can be assigned to the first n-gram class.

    Abstract translation: 用于识别无意义的数据并产生可以拒绝无意义输入的自然语言统计模型的方法。 该方法可以包括从一组训练数据中识别单独无意义的单字组。 识别为无意义的unigrams的至少一部分可以分配给第一个n-gram类。 该方法还可以包括识别完全由无意义单元组成的二进制,并且确定所识别的两字组是否是单独无意义的。 识别为单独无意义的二进制组的至少一部分可以分配给第一个n-gram类。

    Extracting tokens in a natural language understanding application
    22.
    发明授权
    Extracting tokens in a natural language understanding application 有权
    在自然语言理解应用中提取令牌

    公开(公告)号:US08285539B2

    公开(公告)日:2012-10-09

    申请号:US11764285

    申请日:2007-06-18

    CPC classification number: G06F17/277 G06F17/278

    Abstract: A method of processing text within a natural language understanding system can include applying a first tokenization technique to a sentence using a statistical tokenization model. A second tokenization technique using a named entity can be applied to the sentence when the first tokenization technique does not extract a needed token according to a class of the sentence. A token determined according to at least one of the tokenization techniques can be output.

    Abstract translation: 一种在自然语言理解系统内处理文本的方法可以包括使用统计标记化模型对句子应用第一标记化技术。 当第一个标记化技术不能根据句子的类提取所需的标记时,使用命名实体的第二个标记化技术可以应用于该句子。 可以输出根据令牌化技术中的至少一个所确定的令牌。

    Method and system for automatically building natural language understanding models
    23.
    发明授权
    Method and system for automatically building natural language understanding models 有权
    自动建立自然语言理解模型的方法和系统

    公开(公告)号:US07835911B2

    公开(公告)日:2010-11-16

    申请号:US11324057

    申请日:2005-12-30

    CPC classification number: G06F17/27

    Abstract: The invention disclosed herein concerns a system (100) and method (600) for building a language model representation of an NLU application. The method 500 can include categorizing an NLU application domain (602), classifying a corpus in view of the categorization (604), and training at least one language model in view of the classification (606). The categorization produces a hierarchical tree of categories, sub-categories and end targets across one or more features for interpreting one or more natural language input requests. During development of an NLU application, a developer assigns sentences of the NLU application to categories, sub-categories or end targets across one or more features for associating each sentence with desire interpretations. A language model builder (140) iteratively builds multiple language models for this sentence data, and iteratively evaluating them against a test corpus, partitioning the data based on the categorization and rebuilding models, so as to produce an optimal configuration of language models to interpret and respond to language input requests for the NLU application.

    Abstract translation: 本文公开的发明涉及用于构建NLU应用的语言模型表示的系统(100)和方法(600)。 方法500可以包括将NLU应用域(602)分类,鉴于分类(604)对语料库进行分类,并且考虑到分类(606)训练至少一种语言模型。 分类产生跨越一个或多个特征的类别,子类别和结束目标的分层树,用于解释一个或多个自然语言输入请求。 在开发NLU应用程序期间,开发人员将NLU应用程序的句子分配给一个或多个功能的类别,子类别或终端目标,以将每个句子与欲望解释相关联。 语言模型构建器(140)迭代地构建用于该句子数据的多个语言模型,并且针对测试语料库迭代地对它们进行评估,基于分类和重建模型对数据进行分区,以产生语言模型的最佳配置来解释和 响应NLU应用程序的语言输入请求。

    Reclassification of Training Data to Improve Classifier Accuracy
    24.
    发明申请
    Reclassification of Training Data to Improve Classifier Accuracy 有权
    培训数据重新分类,提高分类精度

    公开(公告)号:US20080312906A1

    公开(公告)日:2008-12-18

    申请号:US11764291

    申请日:2007-06-18

    CPC classification number: G06F17/30705

    Abstract: A method of creating a statistical classification model for a classifier within a natural language understanding system can include processing training data using an existing statistical classification model. Sentences of the training data correctly classified into a selected class of the statistical classification model can be selected. The selected sentences of the training data can be assigned to a fringe group or a core group according to confidence score. The training data can be updated by associating the fringe group with a fringe subclass of the selected class and the core group with a core subclass of the selected class. A new statistical classification model can be built from the updated training data. The new statistical classification model can be output.

    Abstract translation: 在自然语言理解系统内创建用于分类器的统计分类模型的方法可以包括使用现有的统计分类模型处理训练数据。 可以选择正确分类为所选类别的统计分类模型的训练数据句子。 训练数据的选定句子可以根据置信度得分分配给边缘组或核心组。 可以通过将边缘组与所选类的边缘子类和具有所选类的核心子类的核心组相关联来更新训练数据。 可以从更新的训练数据构建新的统计分类模型。 可以输出新的统计分类模型。

    Sub-Model Generation to Improve Classification Accuracy
    25.
    发明申请
    Sub-Model Generation to Improve Classification Accuracy 有权
    子模型生成提高分类精度

    公开(公告)号:US20080312904A1

    公开(公告)日:2008-12-18

    申请号:US11764274

    申请日:2007-06-18

    CPC classification number: G06F17/2715

    Abstract: A method of classifying text input for use with a natural language understanding system can include determining classification information including a primary classification and one or more secondary classifications for a received text input using a statistical classification model (statistical model). A statistical classification sub-model (statistical sub-model) can be selectively built according to a model generation criterion applied to the classification information. The method further can include selecting the primary classification or the secondary classification for the text input as a final classification according to the statistical sub-model and outputting the final classification for the text input.

    Abstract translation: 用于分类文本输入以与自然语言理解系统一起使用的方法可以包括使用统计分类模型(统计模型)确定包括主分类的分类信息和用于所接收的文本输入的一个或多个次分类。 可以根据应用于分类信息的模型生成准则选择性地建立统计分类子模型(统计子模型)。 该方法还可以包括根据统计子模型选择文本输入的主分类或次级分类作为最终分类,并输出文本输入的最终分类。

    Relative delta computations for determining the meaning of language inputs
    26.
    发明授权
    Relative delta computations for determining the meaning of language inputs 有权
    用于确定语言输入含义的相对增量计算

    公开(公告)号:US07366666B2

    公开(公告)日:2008-04-29

    申请号:US10677044

    申请日:2003-10-01

    CPC classification number: G06F17/2715 G06F17/2785 G10L15/26

    Abstract: A method for processing language input can include the step of determining at least two possible meanings for a language input. For each possible meaning, a probability that the possible meaning is a correct interpretation of the language input can be determined. At least one relative data computation can be computed based at least in part upon the probabilities. At least one irregularity within the language input can be detected based upon the relative delta computation. The irregularity can include mumble, ambiguous input, and/or compound input. At least one programmatic action can be performed responsive to the detection of the irregularity.

    Abstract translation: 用于处理语言输入的方法可以包括为语言输入确定至少两个可能的含义的步骤。 对于每个可能的含义,可以确定可能的含义是语言输入的正确解释的概率。 可以至少部分地基于概率来计算至少一个相对数据计算。 可以基于相对增量计算来检测语言输入内的至少一个不规则性。 不规则性可以包括嘟嘟声,模糊输入和/或复合输入。 响应于不规则性的检测可以执行至少一个编程动作。

    Relative delta computations for determining the meaning of language inputs
    27.
    发明申请
    Relative delta computations for determining the meaning of language inputs 有权
    用于确定语言输入含义的相对增量计算

    公开(公告)号:US20050075874A1

    公开(公告)日:2005-04-07

    申请号:US10677044

    申请日:2003-10-01

    CPC classification number: G06F17/2715 G06F17/2785 G10L15/26

    Abstract: A method for processing language input can include the step of determining at least two possible meanings for a language input. For each possible meaning, a probability that the possible meaning is a correct interpretation of the language input can be determined. At least one relative data computation can be computed based at least in part upon the probabilities. At least one irregularity within the language input can be detected based upon the relative delta computation. The irregularity can include mumble, ambiguous input, and/or compound input. At least one programmatic action can be performed responsive to the detection of the irregularity.

    Abstract translation: 用于处理语言输入的方法可以包括为语言输入确定至少两个可能的含义的步骤。 对于每个可能的含义,可以确定可能的含义是语言输入的正确解释的概率。 可以至少部分地基于概率来计算至少一个相对数据计算。 可以基于相对增量计算来检测语言输入内的至少一个不规则性。 不规则性可以包括嘟嘟声,模糊输入和/或复合输入。 响应于不规则性的检测可以执行至少一个编程动作。

Patent Agency Ranking