Method and apparatus for improved grammar checking using a stochastic parser

    公开(公告)号:US07003444B2

    公开(公告)日:2006-02-21

    申请号:US09904232

    申请日:2001-07-12

    Inventor: David Neal Weise

    CPC classification number: G06F17/274

    Abstract: A method and grammar checking system are provided that generate a stochastic score, or a statistical goodness measure, for each of an input string of text and one or more alternative strings of text. An alternative generator generates the alternative strings of text, and a ranking parser produces parse trees and corresponding statistical goodness measures for each of the strings. The string of text having the highest goodness measure is selected for recommendation to a user.

    System and method for parsing a natural language input span using a candidate list to generate alternative nodes
    2.
    发明授权
    System and method for parsing a natural language input span using a candidate list to generate alternative nodes 失效
    用于使用候选列表解析自然语言输入范围以产生替代节点的系统和方法

    公开(公告)号:US06236959B1

    公开(公告)日:2001-05-22

    申请号:US09103057

    申请日:1998-06-23

    Inventor: David Neal Weise

    CPC classification number: G06F17/2705

    Abstract: An improved natural language parser uses a directed search template set to identify problematic word sequences, thus reducing processing time while increasing accuracy. The directed search template set is used to identify problematic input spans or portions of input spans. A problematic input span is one that contains at least one word or phrase that can be constructed in alternative ways. Problematic input spans can reduce the efficiency of a natural language parser and can result in the production of an inaccurate parse tree. Once a problematic span has been identified, the improved parser generates alternative parses for the problematic portion of the input span. This on-the-fly alternative parse generation permits the parser to consider the alternatives as early in the parse process as possible, thus reducing the overall time needed to parse a problematic input span.

    Abstract translation: 改进的自然语言解析器使用定向搜索模板集合来识别有问题的词序列,从而在增加精度的同时减少处理时间。 定向搜索模板集用于识别有问题的输入跨度或输入跨度的部分。 有问题的输入范围是包含至少一个可以替代方式构建的单词或短语的输入范围。 有问题的输入范围可以降低自然语言解析器的效率,并可能导致生成不准确的解析树。 一旦确定了有问题的跨度,改进的解析器为输入范围的有问题的部分生成替代解析。 这种即时替代解析生成允许解析器尽可能早地在解析过程中考虑替代方案,从而减少解析有问题输入范围所需的总体时间。

    Method and apparatus for providing improved HMM POS tagger for multi-word entries and factoids

    公开(公告)号:US06985851B2

    公开(公告)日:2006-01-10

    申请号:US09907315

    申请日:2001-07-17

    CPC classification number: G06F17/2715

    Abstract: A method of calculating trigram path probabilities for an input string of text containing a multi-word-entry (MWE) or a factoid includes tokenizing the input string to create a plurality of parse leaf units (PLUs). A PosColumn is constructed for each word, MWE, factoid and character in the input string of text which has a unique first (Ft) and last (Lt) token pair. TrigramColumns are constructed which define corresponding TrigramNodes each representing a trigram for three PosColumns. Forward and backward trigram path probabilities are calculated for each separate TrigramNode. The sums of all trigram path probabilities through each PLU are then calculated as a function of the forward and backward trigram path probabilities. Systems and computer-readable medium configured to implement the methods are also provided.

    Statistically driven sentence realizing method and apparatus
    4.
    发明授权
    Statistically driven sentence realizing method and apparatus 失效
    统计驱动句子实现方法和装置

    公开(公告)号:US07266491B2

    公开(公告)日:2007-09-04

    申请号:US11152352

    申请日:2005-06-14

    CPC classification number: G06F17/2881

    Abstract: A method of, and system for, generating a sentence from a semantic representation maps the semantic representation to an unordered set of syntactic nodes. Simplified generation grammar rules and statistical goodness measure values from a corresponding analysis grammar are then used to create a tree structure to order the syntactic nodes. The sentence is then generated from the tree structure. The generation grammar is a simplified (context free) version of a corresponding full (context sensitive) analysis grammar. In the generation grammar, conditions on each rule are ignored except those directly related to the semantic representation. The statistical goodness measure values, which are calculated through an analysis training phase in which a corpus of example sentences is processed using the full analysis grammar, are used to guide the generation choice to prefer substructures most commonly found in a particular syntactic/semantic context during analysis.

    Abstract translation: 用于从语义表示生成句子的方法和系统将语义表示映射到无序的语法节点集合。 然后,使用来自对应分析语法的简化生成语法规则和统计学好的度量值来创建树结构来排序句法节点。 然后从树结构生成句子。 生成语法是相应的完整(上下文相关)分析语法的简化(上下文自由)版本。 在生成语法中,忽略每个规则的条件,除了与语义表示直接相关的条件。 通过使用完整分析语法处理例句的语料库的分析训练阶段计算出的统计学好度量值被用于指导生成选择以优选在特定语法/语义语境中最常见的子结构 分析。

    Method and apparatus for improved grammar checking using a stochastic parser
    5.
    发明授权
    Method and apparatus for improved grammar checking using a stochastic parser 有权
    使用随机解析器改进语法检查的方法和装置

    公开(公告)号:US07184950B2

    公开(公告)日:2007-02-27

    申请号:US11177129

    申请日:2005-07-08

    Inventor: David Neal Weise

    CPC classification number: G06F17/274

    Abstract: A method and grammar checking system are provided that generate a stochastic score, or a statistical goodness measure, for each of an input string of text and one or more alternative strings of text. An alternative generator generates the alternative strings of text, and a ranking parser produces parse trees and corresponding statistical goodness measures for each of the strings. The string of text having the highest goodness measure is selected for recommendation to a user.

    Abstract translation: 提供了一种方法和语法检查系统,其为文本的输入字符串和一个或多个替代的文本串中的每一个生成随机记分或统计良品度量。 替代生成器生成替代的文本字符串,并且排序解析器为每个字符串产生解析树和对应的统计优点度量。 选择具有最高质量度量的文本字符串以推荐给用户。

    Method and apparatus for providing improved HMM POS tagger for multi-word entries and factoids

    公开(公告)号:US07124074B2

    公开(公告)日:2006-10-17

    申请号:US11151953

    申请日:2005-06-14

    CPC classification number: G06F17/2715

    Abstract: A method of calculating trigram path probabilities for an input string of text containing a multi-word-entry (MWE) or a factoid includes tokenizing the input string to create a plurality of parse leaf units (PLUs). A PosColumn is constructed for each word, MWE, factoid and character in the input string of text which has a unique first (Ft) and last (Lt) token pair. TrigramColumns are constructed which define corresponding TrigramNodes each representing a trigram for three PosColumns. Forward and backward trigram path probabilities are calculated for each separate TrigramNode. The sums of all trigram path probabilities through each PLU are then calculated as a function of the forward and backward trigram path probabilities. Systems and computer-readable medium configured to implement the methods are also provided.

    Statistically driven sentence realizing method and apparatus

    公开(公告)号:US07003445B2

    公开(公告)日:2006-02-21

    申请号:US09909530

    申请日:2001-07-20

    CPC classification number: G06F17/2881

    Abstract: A method of, and system for, generating a sentence from a semantic representation maps the semantic representation to an unordered set of syntactic nodes. Simplified generation grammar rules and statistical goodness measure values from a corresponding analysis grammar are then used to create a tree structure to order the syntactic nodes. The sentence is then generated from the tree structure. The generation grammar is a simplified (context free) version of a corresponding full (context sensitive) analysis grammar. In the generation grammar, conditions on each rule are ignored except those directly related to the semantic representation. The statistical goodness measure values, which are calculated through an analysis training phase in which a corpus of example sentences is processed using the full analysis grammar, are used to guide the generation choice to prefer substructures most commonly found in a particular syntactic/semantic context during analysis.

Patent Agency Ranking