Automatic clustering of tokens from a corpus for grammar acquisition
    3.
    发明授权
    Automatic clustering of tokens from a corpus for grammar acquisition 有权
    用于语法获取的语料库的令牌的自动聚类

    公开(公告)号:US07966174B1

    公开(公告)日:2011-06-21

    申请号:US12030935

    申请日:2008-02-14

    IPC分类号: G06F17/27

    CPC分类号: G06K9/6282 G06K9/6218

    摘要: A system for recognizing patterns is disclosed. Grammar learning from a corpus includes, for the other non-context words, generating frequency vectors for each non-context token in a corpus based upon counted occurrences of a predetermined relationship of the non-context tokens to identified context tokens. Clusters are grown from the frequency vectors according to a lexical correlation or a cluster tree among the non-context tokens. The cluster tree is used for pattern recognition.

    摘要翻译: 公开了一种用于识别图案的系统。 基于语料库的语法学习包括针对其他非上下文单词,基于对所识别的上下文令牌的非上下文令牌的预定关系的计数出现,为语料库中的每个非上下文令牌生成频率向量。 群集根据词汇相关性或非上下文令牌中的簇树从频率向量生长。 簇树用于模式识别。

    Method for building linguistic models from a corpus
    4.
    发明授权
    Method for building linguistic models from a corpus 有权
    从语料库构建语言模型的方法

    公开(公告)号:US06415248B1

    公开(公告)日:2002-07-02

    申请号:US09443891

    申请日:1999-11-19

    IPC分类号: G06F1720

    摘要: A method iteratively integrates clustering techniques with phrase acquisition techniques to build complex linguistic models from a corpus. A set of features is initialized by the corpus. Thereafter, the method determines, according to a predetermined cost function, to process the features by one of phrase clustering processing or phrase grammar learning processing. If phrase clustering processing is performed, the method processes an interstitial set of features comprising both the old features and newly established clusters by phrase grammar learning processing. The features obtained as an output of phrase grammar learning is re-indexed as a set of features for a subsequent iteration. The method may be repeated over several iterations to build a hierarchical linguistic model.

    摘要翻译: 一种方法迭代地将聚类技术与短语获取技术相结合,从语料库构建复杂的语言模型。 一组特征由语料库初始化。 此后,该方法根据预定的成本函数确定通过短语聚类处理或短语语法学习处理之一处理特征。 如果执行短语群集处理,则该方法通过短语语法学习处理来处理包括旧特征和新建簇的特征的插页式集。 作为短语语法学习的输出获得的特征被重新索引为随后的迭代的一组特征。 该方法可以在多次迭代中重复以构建分级语言模型。

    Automatic clustering of tokens from a corpus for grammar acquisition
    5.
    发明授权
    Automatic clustering of tokens from a corpus for grammar acquisition 有权
    用于语法获取的语料库的令牌的自动聚类

    公开(公告)号:US07356462B2

    公开(公告)日:2008-04-08

    申请号:US10662730

    申请日:2003-09-15

    IPC分类号: G06F17/27

    CPC分类号: G06K9/6282 G06K9/6218

    摘要: A method of grammar learning from a corpus comprises, for the other non-context words, generating frequency vectors for each non-context token in a corpus based upon counted occurrences of a predetermined relationship of the non-context tokens to identified context tokens. Clusters are grown from the frequency vectors according to a lexical correlation among the non-context tokens.

    摘要翻译: 基于语料库的语法学习的方法包括对于其他非上下文单词,基于对所识别的上下文令牌的非上下文令牌的预定关系的计数出现,为语料库中的每个非上下文令牌生成频率向量。 根据非上下文令牌之间的词汇相关性,从频率向量生长群集。

    Automatic clustering of tokens from a corpus for grammar acquisition
    6.
    发明授权
    Automatic clustering of tokens from a corpus for grammar acquisition 有权
    用于语法获取的语料库的令牌的自动聚类

    公开(公告)号:US06317707B1

    公开(公告)日:2001-11-13

    申请号:US09207326

    申请日:1998-12-07

    IPC分类号: G06F1727

    摘要: In a method of learning grammar from a corpus, context words are identified from a corpus. For the other non-context words, the method counts the occurrence of predetermined relationships which the context words, and maps the counted occurrences to a multidimensional frequency space. Clusters are grown from the frequency vectors. The clusters represent classes of words; words in the same cluster possess the same lexical significancy and provide an indicator of grammatical structure.

    摘要翻译: 在从语料库学习语法的方法中,从语料库中识别语境词。 对于其他非上下文单词,该方法计算上下文单词的预定关系的发生,并将计数的出现映射到多维频率空间。 群体从频率向量生长。 集群表示单词类; 同一集群中的词具有相同的词汇意义,并提供了语法结构的指标。

    Method and apparatus for providing stochastic finite-state machine translation
    7.
    发明授权
    Method and apparatus for providing stochastic finite-state machine translation 有权
    提供随机有限状态机器翻译的方法和装置

    公开(公告)号:US07113903B1

    公开(公告)日:2006-09-26

    申请号:US10058995

    申请日:2002-01-30

    IPC分类号: G10L15/00

    摘要: A method and apparatus for stochastic finite-state machine translation is provided. The method may include receiving a speech input and translating the speech input in a source language into one or more symbols in a target language based on stochastic language model. Subsequently, all possible sequences of the translated symbols may be generated. One of the generated sequences may be selected based on a monolingual target language model.

    摘要翻译: 提供了一种用于随机有限状态机器翻译的方法和装置。 该方法可以包括基于随机语言模型接收语音输入并将源语言中的语音输入转换为目标语言中的一个或多个符号。 随后,可以生成翻译符号的所有可能的序列。 可以基于单语目标语言模型来选择生成的序列之一。

    Automatic clustering of tokens from a corpus for grammar acquisition
    8.
    发明授权
    Automatic clustering of tokens from a corpus for grammar acquisition 有权
    用于语法获取的语料库的令牌的自动聚类

    公开(公告)号:US06751584B2

    公开(公告)日:2004-06-15

    申请号:US09912461

    申请日:2001-07-26

    IPC分类号: G06F1727

    摘要: In a method of learning grammar from a corpus, context words are identified from a corpus. For the other non-context words, the method counts the occurrence of predetermined relationships which the context words, and maps the counted occurrences to a multidimensional frequency space. Clusters are grown from the frequency vectors. The clusters represent classes of words; words in the same cluster possess the same lexical significancy and provide an indicator of grammatical structure.

    摘要翻译: 在从语料库学习语法的方法中,从语料库中识别语境词。 对于其他非上下文单词,该方法计算上下文单词的预定关系的发生,并将计数的出现映射到多维频率空间。 群体从频率向量生长。 集群表示单词类; 同一集群中的词具有相同的词汇意义,并提供了语法结构的指标。

    System and method for collaborative language translation
    10.
    发明授权
    System and method for collaborative language translation 有权
    用于协同语言翻译的系统和方法

    公开(公告)号:US09323746B2

    公开(公告)日:2016-04-26

    申请号:US13311836

    申请日:2011-12-06

    IPC分类号: G06F17/28

    摘要: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for presenting a machine translation and alternative translations to a user, where a selection of any particular alternative translation results in the re-ranking of the remaining alternatives. The system then presents these re-ranked alternatives to the user, who can continue proofing the machine translation using the re-ranked alternatives or by typing an improved translation. This process continues until the user indicates that the current portion of the translation is complete, at which point the system moves to the next portion.

    摘要翻译: 本文公开了用于向用户呈现机器翻译和替代翻译的系统,方法和非暂时的计算机可读存储介质,其中任何特定替代翻译的选择导致其余替代方案的重新排序。 然后,该系统将这些重新排列的替代品呈现给用户,他们可以使用重新排列的替代品或通过输入改进的翻译来继续打印机器翻译。 该过程继续,直到用户指示翻译的当前部分完成,在该点系统移动到下一部分。