Representing n-gram language models for compact storage and fast retrieval
    1.
    发明授权
    Representing n-gram language models for compact storage and fast retrieval 有权
    代表用于紧凑存储和快速检索的n-gram语言模型

    公开(公告)号:US08175878B1

    公开(公告)日:2012-05-08

    申请号:US12968108

    申请日:2010-12-14

    IPC分类号: G10L15/18 G10L15/06 G06F17/27

    摘要: Systems, methods, and apparatuses, including computer program products, are provided for representing language models. In some implementations, a computer-implemented method is provided. The method includes generating a compact language model including receiving a collection of n-grams from the corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus and generating a trie representing the collection of n-grams. The method also includes using the language model to identify a second probability of a particular string of words occurring.

    摘要翻译: 提供了用于表示语言模型的系统,方法和装置,包括计算机程序产品。 在一些实现中,提供了计算机实现的方法。 该方法包括生成紧凑语言模型,包括从语料库接收n-gram的集合,每个n-gram的集合具有在语料库中发生的对应的第一概率,并且生成代表n-gram的集合的特里。 该方法还包括使用语言模型来识别发生的特定字符串字符串的第二概率。

    Representing n-gram language models for compact storage and fast retrieval
    2.
    发明授权
    Representing n-gram language models for compact storage and fast retrieval 有权
    代表用于紧凑存储和快速检索的n-gram语言模型

    公开(公告)号:US07877258B1

    公开(公告)日:2011-01-25

    申请号:US11693613

    申请日:2007-03-29

    IPC分类号: G10L15/18 G10L15/06 G06F17/27

    摘要: Systems, methods, and apparatuses, including computer program products, are provided for representing language models. In some implementations, a computer-implemented method is provided. The method includes generating a compact language model including receiving a collection of n-grams from the corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus and generating a trie representing the collection of n-grams. The method also includes using the language model to identify a second probability of a particular string of words occurring.

    摘要翻译: 提供了用于表示语言模型的系统,方法和装置,包括计算机程序产品。 在一些实现中,提供了计算机实现的方法。 该方法包括生成紧凑语言模型,包括从语料库接收n-gram的集合,每个n-gram的集合具有在语料库中发生的对应的第一概率,并且生成代表n-gram的集合的特里。 该方法还包括使用语言模型来识别发生的特定字符串字符串的第二概率。

    Discriminative training of language models for text and speech classification
    3.
    发明授权
    Discriminative training of language models for text and speech classification 有权
    文本和语言分类语言模型的歧视性训练

    公开(公告)号:US08306818B2

    公开(公告)日:2012-11-06

    申请号:US12103035

    申请日:2008-04-15

    IPC分类号: G10L15/00 G06F17/27

    摘要: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.

    摘要翻译: 公开了用于估计语言模型的方法,使得给定字串的类的条件似然性与分类准确度非常良好地相关联。 这些方法包括对所有类共同调整统计语言模型参数,使得分类器在给定训练句或话语中区分正确类和不正确类之间的差异。 本发明的具体实施例涉及在n-gram分类器的鉴别训练技术的上下文中实现有理函数增长变换。

    Generic spelling mnemonics
    4.
    发明授权
    Generic spelling mnemonics 失效
    通用拼写助记符

    公开(公告)号:US07765102B2

    公开(公告)日:2010-07-27

    申请号:US12171309

    申请日:2008-07-11

    IPC分类号: G10L15/00

    CPC分类号: G10L15/183

    摘要: A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new Language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.

    摘要翻译: 一种用于创建与语音识别软件应用一起使用的助记符语言模型的系统和方法,其中所述方法包括生成包含预定义的大量字符的n-gram语言模型,其中所述n-gram语言模型包括至少一个字符 从所述预定义的大量字符中,为所述至少一个字符中的每一个构造新的语言模型(LM)令牌,响应于预定义的发音字典提取所述至少一个字符中的每个字符的发音,以获得字符发音表示,创建 响应于字符发音表示的至少一个字符中的每一个的至少一个替代发音,以创建替代发音字典并且编译用于语音识别软件应用的n-gram语言模型,其中编译语言模型响应于 新的语言模型标记和替代发音 词典。

    Discriminative training of language models for text and speech classification
    5.
    发明授权
    Discriminative training of language models for text and speech classification 有权
    文本和语言分类语言模型的歧视性训练

    公开(公告)号:US07379867B2

    公开(公告)日:2008-05-27

    申请号:US10453349

    申请日:2003-06-03

    IPC分类号: G06F17/27 G10L15/00

    摘要: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.

    摘要翻译: 公开了用于估计语言模型的方法,使得给定字串的类的条件似然性与分类准确度非常良好地相关联。 这些方法包括对所有类共同调整统计语言模型参数,使得分类器在给定训练句或话语中区分正确类和不正确类之间的差异。 本发明的具体实施例涉及在n-gram分类器的鉴别训练技术的上下文中实现有理函数增长变换。

    Conditional maximum likelihood estimation of naïve bayes probability models
    8.
    发明授权
    Conditional maximum likelihood estimation of naïve bayes probability models 有权
    初始贝叶斯概率模型的条件最大似然估计

    公开(公告)号:US07624006B2

    公开(公告)日:2009-11-24

    申请号:US10941399

    申请日:2004-09-15

    IPC分类号: G06F17/27 G06F17/20 G06F17/30

    摘要: A statistical classifier is constructed by estimating Naïve Bayes classifiers such that the conditional likelihood of class given word sequence is maximized. The classifier is constructed using a rational function growth transform implemented for Naïve Bayes classifiers. The estimation method tunes the model parameters jointly for all classes such that the classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Optional parameter smoothing and/or convergence speedup can be used to improve model performance. The classifier can be integrated into a speech utterance classification system or other natural language processing system.

    摘要翻译: 通过估计朴素贝叶斯分类器来构建统计分类器,使得给定字序列的条件似然性最大化。 分类器是使用为朴素贝叶斯分类器实现的理性函数增长变换构建的。 估计方法为所有类别共同调整模型参数,以便分类器对于给定的训练句或话语来区分正确的类和不正确的类。 可选参数平滑和/或收敛加速可用于提高模型性能。 分类器可以集成到语音语音分类系统或其他自然语言处理系统中。

    DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR TEXT AND SPEECH CLASSIFICATION
    9.
    发明申请
    DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR TEXT AND SPEECH CLASSIFICATION 有权
    用于文本和语音分类的语言模式的歧视性培训

    公开(公告)号:US20080215311A1

    公开(公告)日:2008-09-04

    申请号:US12103035

    申请日:2008-04-15

    IPC分类号: G06F17/27

    摘要: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.

    摘要翻译: 公开了用于估计语言模型的方法,使得给定字串的类的条件似然性与分类准确度非常良好地相关联。 这些方法包括对所有类共同调整统计语言模型参数,使得分类器在给定训练句或话语中区分正确类和不正确类之间的差异。 本发明的具体实施例涉及在n-gram分类器的鉴别训练技术的上下文中实现有理函数增长变换。

    Generic spelling mnemonics
    10.
    发明申请
    Generic spelling mnemonics 失效
    通用拼写助记符

    公开(公告)号:US20060111907A1

    公开(公告)日:2006-05-25

    申请号:US10996732

    申请日:2004-11-24

    IPC分类号: G10L15/18

    CPC分类号: G10L15/183

    摘要: A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.

    摘要翻译: 一种用于创建与语音识别软件应用一起使用的助记符语言模型的系统和方法,其中所述方法包括生成包含预定义的大量字符的n-gram语言模型,其中所述n-gram语言模型包括至少一个字符 从所述预定义的大量字符中,为所述至少一个字符中的每一个构造新语言模型(LM)令牌,响应于预定义的发音字典提取所述至少一个字符中的每个字符的发音以获得字符发音表示,创建 响应于字符发音表示的至少一个字符中的每一个的至少一个替代发音,以创建替代发音字典并且编译用于语音识别软件应用的n-gram语言模型,其中编译语言模型响应于 新的语言模型标记和替代发音 词典。