Conditional maximum likelihood estimation of naive bayes probability models
    21.
    发明申请
    Conditional maximum likelihood estimation of naive bayes probability models 有权
    天真贝叶斯概率模型的条件最大似然估计

    公开(公告)号:US20060074630A1

    公开(公告)日:2006-04-06

    申请号:US10941399

    申请日:2004-09-15

    IPC分类号: G06F17/27

    摘要: A statistical classifier is constructed by estimating Naïve Bayes classifiers such that the conditional likelihood of class given word sequence is maximized. The classifier is constructed using a rational function growth transform implemented for Naïve Bayes classifiers. The estimation method tunes the model parameters jointly for all classes such that the classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Optional parameter smoothing and/or convergence speedup can be used to improve model performance. The classifier can be integrated into a speech utterance classification system or other natural language processing system.

    摘要翻译: 通过估计朴素贝叶斯分类器来构建统计分类器,使得给定字序列的条件似然性最大化。 分类器是使用为朴素贝叶斯分类器实现的理性函数增长变换构建的。 估计方法为所有类别共同调整模型参数,以便分类器对于给定的训练句或话语来区分正确的类和不正确的类。 可选参数平滑和/或收敛加速可用于提高模型性能。 分类器可以集成到语音语音分类系统或其他自然语言处理系统中。

    Method and apparatus for capitalizing text using maximum entropy
    22.
    发明申请
    Method and apparatus for capitalizing text using maximum entropy 审中-公开
    使用最大熵来大写文本的方法和装置

    公开(公告)号:US20060020448A1

    公开(公告)日:2006-01-26

    申请号:US10977870

    申请日:2004-10-29

    IPC分类号: G06F17/21

    CPC分类号: G06F17/273

    摘要: A method and apparatus are provided for selecting a form of capitalization for a text by determining a probability of a capitalization form for a word using a weighted sum of features. The features are based on the capitalization form and a context for the word.

    摘要翻译: 提供了一种方法和装置,用于通过使用特征的加权和来确定单词的大小写形式的概率来选择文本的大小写形式。 这些特征是基于大写形式和单词的上下文。

    Representing n-gram language models for compact storage and fast retrieval
    23.
    发明授权
    Representing n-gram language models for compact storage and fast retrieval 有权
    代表用于紧凑存储和快速检索的n-gram语言模型

    公开(公告)号:US08175878B1

    公开(公告)日:2012-05-08

    申请号:US12968108

    申请日:2010-12-14

    IPC分类号: G10L15/18 G10L15/06 G06F17/27

    摘要: Systems, methods, and apparatuses, including computer program products, are provided for representing language models. In some implementations, a computer-implemented method is provided. The method includes generating a compact language model including receiving a collection of n-grams from the corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus and generating a trie representing the collection of n-grams. The method also includes using the language model to identify a second probability of a particular string of words occurring.

    摘要翻译: 提供了用于表示语言模型的系统,方法和装置,包括计算机程序产品。 在一些实现中,提供了计算机实现的方法。 该方法包括生成紧凑语言模型,包括从语料库接收n-gram的集合,每个n-gram的集合具有在语料库中发生的对应的第一概率,并且生成代表n-gram的集合的特里。 该方法还包括使用语言模型来识别发生的特定字符串字符串的第二概率。

    Generic spelling mnemonics
    24.
    发明授权
    Generic spelling mnemonics 失效
    通用拼写助记符

    公开(公告)号:US07765102B2

    公开(公告)日:2010-07-27

    申请号:US12171309

    申请日:2008-07-11

    IPC分类号: G10L15/00

    CPC分类号: G10L15/183

    摘要: A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new Language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.

    摘要翻译: 一种用于创建与语音识别软件应用一起使用的助记符语言模型的系统和方法,其中所述方法包括生成包含预定义的大量字符的n-gram语言模型,其中所述n-gram语言模型包括至少一个字符 从所述预定义的大量字符中,为所述至少一个字符中的每一个构造新的语言模型(LM)令牌,响应于预定义的发音字典提取所述至少一个字符中的每个字符的发音,以获得字符发音表示,创建 响应于字符发音表示的至少一个字符中的每一个的至少一个替代发音,以创建替代发音字典并且编译用于语音识别软件应用的n-gram语言模型,其中编译语言模型响应于 新的语言模型标记和替代发音 词典。

    Generic spelling mnemonics
    25.
    发明申请
    Generic spelling mnemonics 失效
    通用拼写助记符

    公开(公告)号:US20060111907A1

    公开(公告)日:2006-05-25

    申请号:US10996732

    申请日:2004-11-24

    IPC分类号: G10L15/18

    CPC分类号: G10L15/183

    摘要: A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.

    摘要翻译: 一种用于创建与语音识别软件应用一起使用的助记符语言模型的系统和方法,其中所述方法包括生成包含预定义的大量字符的n-gram语言模型,其中所述n-gram语言模型包括至少一个字符 从所述预定义的大量字符中,为所述至少一个字符中的每一个构造新语言模型(LM)令牌,响应于预定义的发音字典提取所述至少一个字符中的每个字符的发音以获得字符发音表示,创建 响应于字符发音表示的至少一个字符中的每一个的至少一个替代发音,以创建替代发音字典并且编译用于语音识别软件应用的n-gram语言模型,其中编译语言模型响应于 新的语言模型标记和替代发音 词典。

    Back-off language model compression
    26.
    发明授权
    Back-off language model compression 有权
    后退语言模型压缩

    公开(公告)号:US08725509B1

    公开(公告)日:2014-05-13

    申请号:US12486358

    申请日:2009-06-17

    CPC分类号: G10L15/183 G06F17/277

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to language models stored for digital language processing. In one aspect, a method includes the actions of generating a language model, including: receiving a collection of n-grams from a corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus, and generating a trie representing the collection of n-grams, the trie being represented using one or more arrays of integers, and compressing an array representation of the trie using block encoding; and using the language model to identify a second probability of a particular string of words occurring.

    摘要翻译: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,与存储用于数字语言处理的语言模型有关。 一方面,一种方法包括生成语言模型的动作,包括:从语料库接收n-gram的集合,每个n-gram的集合具有在语料库中发生的对应的第一概率,并且生成特征 代表n克的集合,使用一个或多个整数数组来表示特里,并使用块编码压缩该特征的阵列表示; 并使用语言模型来识别发生的特定字符串串的第二概率。

    GENERIC SPELLING MNEMONICS
    27.
    发明申请
    GENERIC SPELLING MNEMONICS 失效
    一般发送的MNEMONICS

    公开(公告)号:US20080319749A1

    公开(公告)日:2008-12-25

    申请号:US12171309

    申请日:2008-07-11

    IPC分类号: G10L15/04

    CPC分类号: G10L15/183

    摘要: A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new Language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.

    摘要翻译: 一种用于创建与语音识别软件应用一起使用的助记符语言模型的系统和方法,其中所述方法包括生成包含预定义的大量字符的n-gram语言模型,其中所述n-gram语言模型包括至少一个字符 从所述预定义的大量字符中,为所述至少一个字符中的每一个构造新的语言模型(LM)令牌,响应于预定义的发音字典提取所述至少一个字符中的每个字符的发音,以获得字符发音表示,创建 响应于字符发音表示的至少一个字符中的每一个的至少一个替代发音,以创建替代发音字典并且编译用于语音识别软件应用的n-gram语言模型,其中编译语言模型响应于 新的语言模型标记和替代发音 词典。

    Representing n-gram language models for compact storage and fast retrieval
    28.
    发明授权
    Representing n-gram language models for compact storage and fast retrieval 有权
    代表用于紧凑存储和快速检索的n-gram语言模型

    公开(公告)号:US07877258B1

    公开(公告)日:2011-01-25

    申请号:US11693613

    申请日:2007-03-29

    IPC分类号: G10L15/18 G10L15/06 G06F17/27

    摘要: Systems, methods, and apparatuses, including computer program products, are provided for representing language models. In some implementations, a computer-implemented method is provided. The method includes generating a compact language model including receiving a collection of n-grams from the corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus and generating a trie representing the collection of n-grams. The method also includes using the language model to identify a second probability of a particular string of words occurring.

    摘要翻译: 提供了用于表示语言模型的系统,方法和装置,包括计算机程序产品。 在一些实现中,提供了计算机实现的方法。 该方法包括生成紧凑语言模型,包括从语料库接收n-gram的集合,每个n-gram的集合具有在语料库中发生的对应的第一概率,并且生成代表n-gram的集合的特里。 该方法还包括使用语言模型来识别发生的特定字符串字符串的第二概率。

    Generic spelling mnemonics
    29.
    发明授权
    Generic spelling mnemonics 失效
    通用拼写助记符

    公开(公告)号:US07418387B2

    公开(公告)日:2008-08-26

    申请号:US10996732

    申请日:2004-11-24

    IPC分类号: G10L15/18

    CPC分类号: G10L15/183

    摘要: A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.

    摘要翻译: 一种用于创建与语音识别软件应用一起使用的助记符语言模型的系统和方法,其中所述方法包括生成包含预定义的大量字符的n-gram语言模型,其中所述n-gram语言模型包括至少一个字符 从所述预定义的大量字符中,为所述至少一个字符中的每一个构造新语言模型(LM)令牌,响应于预定义的发音字典提取所述至少一个字符中的每个字符的发音以获得字符发音表示,创建 响应于字符发音表示的至少一个字符中的每一个的至少一个替代发音,以创建替代发音字典并且编译用于语音识别软件应用的n-gram语言模型,其中编译语言模型响应于 新的语言模型标记和替代发音 词典。