Automatic clustering of tokens from a corpus for grammar acquisition
    1.
    发明授权
    Automatic clustering of tokens from a corpus for grammar acquisition 有权
    用于语法获取的语料库的令牌的自动聚类

    公开(公告)号:US07356462B2

    公开(公告)日:2008-04-08

    申请号:US10662730

    申请日:2003-09-15

    IPC分类号: G06F17/27

    CPC分类号: G06K9/6282 G06K9/6218

    摘要: A method of grammar learning from a corpus comprises, for the other non-context words, generating frequency vectors for each non-context token in a corpus based upon counted occurrences of a predetermined relationship of the non-context tokens to identified context tokens. Clusters are grown from the frequency vectors according to a lexical correlation among the non-context tokens.

    摘要翻译: 基于语料库的语法学习的方法包括对于其他非上下文单词,基于对所识别的上下文令牌的非上下文令牌的预定关系的计数出现,为语料库中的每个非上下文令牌生成频率向量。 根据非上下文令牌之间的词汇相关性,从频率向量生长群集。

    Automatic clustering of tokens from a corpus for grammar acquisition
    3.
    发明授权
    Automatic clustering of tokens from a corpus for grammar acquisition 有权
    用于语法获取的语料库的令牌的自动聚类

    公开(公告)号:US07966174B1

    公开(公告)日:2011-06-21

    申请号:US12030935

    申请日:2008-02-14

    IPC分类号: G06F17/27

    CPC分类号: G06K9/6282 G06K9/6218

    摘要: A system for recognizing patterns is disclosed. Grammar learning from a corpus includes, for the other non-context words, generating frequency vectors for each non-context token in a corpus based upon counted occurrences of a predetermined relationship of the non-context tokens to identified context tokens. Clusters are grown from the frequency vectors according to a lexical correlation or a cluster tree among the non-context tokens. The cluster tree is used for pattern recognition.

    摘要翻译: 公开了一种用于识别图案的系统。 基于语料库的语法学习包括针对其他非上下文单词,基于对所识别的上下文令牌的非上下文令牌的预定关系的计数出现,为语料库中的每个非上下文令牌生成频率向量。 群集根据词汇相关性或非上下文令牌中的簇树从频率向量生长。 簇树用于模式识别。

    Method for building linguistic models from a corpus
    4.
    发明授权
    Method for building linguistic models from a corpus 有权
    从语料库构建语言模型的方法

    公开(公告)号:US06415248B1

    公开(公告)日:2002-07-02

    申请号:US09443891

    申请日:1999-11-19

    IPC分类号: G06F1720

    摘要: A method iteratively integrates clustering techniques with phrase acquisition techniques to build complex linguistic models from a corpus. A set of features is initialized by the corpus. Thereafter, the method determines, according to a predetermined cost function, to process the features by one of phrase clustering processing or phrase grammar learning processing. If phrase clustering processing is performed, the method processes an interstitial set of features comprising both the old features and newly established clusters by phrase grammar learning processing. The features obtained as an output of phrase grammar learning is re-indexed as a set of features for a subsequent iteration. The method may be repeated over several iterations to build a hierarchical linguistic model.

    摘要翻译: 一种方法迭代地将聚类技术与短语获取技术相结合,从语料库构建复杂的语言模型。 一组特征由语料库初始化。 此后,该方法根据预定的成本函数确定通过短语聚类处理或短语语法学习处理之一处理特征。 如果执行短语群集处理,则该方法通过短语语法学习处理来处理包括旧特征和新建簇的特征的插页式集。 作为短语语法学习的输出获得的特征被重新索引为随后的迭代的一组特征。 该方法可以在多次迭代中重复以构建分级语言模型。

    Automatic clustering of tokens from a corpus for grammar acquisition
    6.
    发明授权
    Automatic clustering of tokens from a corpus for grammar acquisition 有权
    用于语法获取的语料库的令牌的自动聚类

    公开(公告)号:US06317707B1

    公开(公告)日:2001-11-13

    申请号:US09207326

    申请日:1998-12-07

    IPC分类号: G06F1727

    摘要: In a method of learning grammar from a corpus, context words are identified from a corpus. For the other non-context words, the method counts the occurrence of predetermined relationships which the context words, and maps the counted occurrences to a multidimensional frequency space. Clusters are grown from the frequency vectors. The clusters represent classes of words; words in the same cluster possess the same lexical significancy and provide an indicator of grammatical structure.

    摘要翻译: 在从语料库学习语法的方法中,从语料库中识别语境词。 对于其他非上下文单词,该方法计算上下文单词的预定关系的发生,并将计数的出现映射到多维频率空间。 群体从频率向量生长。 集群表示单词类; 同一集群中的词具有相同的词汇意义,并提供了语法结构的指标。

    Method and apparatus for providing stochastic finite-state machine translation
    7.
    发明授权
    Method and apparatus for providing stochastic finite-state machine translation 有权
    提供随机有限状态机器翻译的方法和装置

    公开(公告)号:US07113903B1

    公开(公告)日:2006-09-26

    申请号:US10058995

    申请日:2002-01-30

    IPC分类号: G10L15/00

    摘要: A method and apparatus for stochastic finite-state machine translation is provided. The method may include receiving a speech input and translating the speech input in a source language into one or more symbols in a target language based on stochastic language model. Subsequently, all possible sequences of the translated symbols may be generated. One of the generated sequences may be selected based on a monolingual target language model.

    摘要翻译: 提供了一种用于随机有限状态机器翻译的方法和装置。 该方法可以包括基于随机语言模型接收语音输入并将源语言中的语音输入转换为目标语言中的一个或多个符号。 随后,可以生成翻译符号的所有可能的序列。 可以基于单语目标语言模型来选择生成的序列之一。

    Automatic clustering of tokens from a corpus for grammar acquisition
    8.
    发明授权
    Automatic clustering of tokens from a corpus for grammar acquisition 有权
    用于语法获取的语料库的令牌的自动聚类

    公开(公告)号:US06751584B2

    公开(公告)日:2004-06-15

    申请号:US09912461

    申请日:2001-07-26

    IPC分类号: G06F1727

    摘要: In a method of learning grammar from a corpus, context words are identified from a corpus. For the other non-context words, the method counts the occurrence of predetermined relationships which the context words, and maps the counted occurrences to a multidimensional frequency space. Clusters are grown from the frequency vectors. The clusters represent classes of words; words in the same cluster possess the same lexical significancy and provide an indicator of grammatical structure.

    摘要翻译: 在从语料库学习语法的方法中,从语料库中识别语境词。 对于其他非上下文单词,该方法计算上下文单词的预定关系的发生,并将计数的出现映射到多维频率空间。 群体从频率向量生长。 集群表示单词类; 同一集群中的词具有相同的词汇意义,并提供了语法结构的指标。

    Unsupervised and active learning in automatic speech recognition for call classification
    9.
    发明授权
    Unsupervised and active learning in automatic speech recognition for call classification 有权
    无监督和主动学习自动语音识别呼叫分类

    公开(公告)号:US08818808B2

    公开(公告)日:2014-08-26

    申请号:US11063910

    申请日:2005-02-23

    IPC分类号: G10L15/06

    摘要: Utterance data that includes at least a small amount of manually transcribed data is provided. Automatic speech recognition is performed on ones of the utterance data not having a corresponding manual transcription to produce automatically transcribed utterances. A model is trained using all of the manually transcribed data and the automatically transcribed utterances. A predetermined number of utterances not having a corresponding manual transcription are intelligently selected and manually transcribed. Ones of the automatically transcribed data as well as ones having a corresponding manual transcription are labeled. In another aspect of the invention, audio data is mined from at least one source, and a language model is trained for call classification from the mined audio data to produce a language model.

    摘要翻译: 提供了至少包含少量手动转录数据的语音数据。 对没有相应的手动转录的话语数据中的一个进行自动语音识别以产生自动转录的话语。 使用所有手动转录数据和自动转录的话语训练模型。 智能地选择并且手动地转录预定数量的不具有对应的手动转录的话语。 自动转录的数据以及具有相应手动转录的数据的标签。 在本发明的另一方面,音频数据从至少一个源开始,并且语言模型被训练用于从所开采的音频数据进行呼叫分类以产生语言模型。

    RECOGNIZING THE NUMERIC LANGUAGE IN NATURAL SPOKEN DIALOGUE
    10.
    发明申请
    RECOGNIZING THE NUMERIC LANGUAGE IN NATURAL SPOKEN DIALOGUE 有权
    识别自然语言对话中的数字语言

    公开(公告)号:US20120041763A1

    公开(公告)日:2012-02-16

    申请号:US13280884

    申请日:2011-10-25

    IPC分类号: G10L15/14

    CPC分类号: G10L15/142

    摘要: A system and a method are provided. A speech recognition processor receives unconstrained input speech and outputs a string of words. The speech recognition processor is based on a numeric language that represents a subset of a vocabulary. The subset includes a set of words identified as being for interpreting and understanding number strings. A numeric understanding processor contains classes of rules for converting the string of words into a sequence of digits. The speech recognition processor utilizes an acoustic model database. A validation database stores a set of valid sequences of digits. A string validation processor outputs validity information based on a comparison of a sequence of digits output by the numeric understanding processor with valid sequences of digits in the validation database.

    摘要翻译: 提供了一种系统和方法。 语音识别处理器接收无约束输入语音并输出一串字。 语音识别处理器基于代表词汇子集的数字语言。 该子集包括被识别为用于解释和理解数字串的一组单词。 数字理解处理器包含用于将字符串转换为数字序列的规则类型。 语音识别处理器利用声学模型数据库。 验证数据库存储一组有效的数字序列。 字符串验证处理器基于数字理解处理器输出的数字序列与验证数据库中的有效数字序列的比较来输出有效性信息。