专利检索 ap:("Ciprian Chelba" OR "Milind Mahajan" OR "Alejandro Acero" OR "Yik-Cheung Tam") AND inv:"Ciprian Chelba" 第 3 页

21.

发明申请
Conditional maximum likelihood estimation of naive bayes probability models 有权
标题翻译：天真贝叶斯概率模型的条件最大似然估计

公开(公告)号：US20060074630A1

公开(公告)日：2006-04-06

申请号：US10941399

申请日：2004-09-15

申请人： Ciprian Chelba , Alejandro Acero

发明人： Ciprian Chelba , Alejandro Acero

IPC分类号： G06F17/27

CPC分类号： G10L15/1822 , G06N7/005 , Y10S707/99936

摘要： A statistical classifier is constructed by estimating Naïve Bayes classifiers such that the conditional likelihood of class given word sequence is maximized. The classifier is constructed using a rational function growth transform implemented for Naïve Bayes classifiers. The estimation method tunes the model parameters jointly for all classes such that the classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Optional parameter smoothing and/or convergence speedup can be used to improve model performance. The classifier can be integrated into a speech utterance classification system or other natural language processing system.

摘要翻译： 通过估计朴素贝叶斯分类器来构建统计分类器，使得给定字序列的条件似然性最大化。分类器是使用为朴素贝叶斯分类器实现的理性函数增长变换构建的。估计方法为所有类别共同调整模型参数，以便分类器对于给定的训练句或话语来区分正确的类和不正确的类。可选参数平滑和/或收敛加速可用于提高模型性能。分类器可以集成到语音语音分类系统或其他自然语言处理系统中。

22.

发明申请
Method and apparatus for capitalizing text using maximum entropy 审中-公开
标题翻译：使用最大熵来大写文本的方法和装置

公开(公告)号：US20060020448A1

公开(公告)日：2006-01-26

申请号：US10977870

申请日：2004-10-29

申请人： Ciprian Chelba , Alejandro Acero

发明人： Ciprian Chelba , Alejandro Acero

IPC分类号： G06F17/21

CPC分类号： G06F17/273

摘要： A method and apparatus are provided for selecting a form of capitalization for a text by determining a probability of a capitalization form for a word using a weighted sum of features. The features are based on the capitalization form and a context for the word.

摘要翻译： 提供了一种方法和装置，用于通过使用特征的加权和来确定单词的大小写形式的概率来选择文本的大小写形式。这些特征是基于大写形式和单词的上下文。

23.

发明授权
Representing n-gram language models for compact storage and fast retrieval 有权
标题翻译：代表用于紧凑存储和快速检索的n-gram语言模型

公开(公告)号：US08175878B1

公开(公告)日：2012-05-08

申请号：US12968108

申请日：2010-12-14

申请人： Ciprian Chelba , Thorsten Brants

发明人： Ciprian Chelba , Thorsten Brants

IPC分类号： G10L15/18 , G10L15/06 , G06F17/27

CPC分类号： G06F17/2715 , G06K9/723 , G06K2209/01 , G10L15/197

摘要： Systems, methods, and apparatuses, including computer program products, are provided for representing language models. In some implementations, a computer-implemented method is provided. The method includes generating a compact language model including receiving a collection of n-grams from the corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus and generating a trie representing the collection of n-grams. The method also includes using the language model to identify a second probability of a particular string of words occurring.

摘要翻译： 提供了用于表示语言模型的系统，方法和装置，包括计算机程序产品。在一些实现中，提供了计算机实现的方法。该方法包括生成紧凑语言模型，包括从语料库接收n-gram的集合，每个n-gram的集合具有在语料库中发生的对应的第一概率，并且生成代表n-gram的集合的特里。该方法还包括使用语言模型来识别发生的特定字符串字符串的第二概率。

24.

发明授权
Generic spelling mnemonics 失效
标题翻译：通用拼写助记符

公开(公告)号：US07765102B2

公开(公告)日：2010-07-27

申请号：US12171309

申请日：2008-07-11

申请人： David Mowatt , Robert Chambers , Ciprian Chelba , Qiang Wu

发明人： David Mowatt , Robert Chambers , Ciprian Chelba , Qiang Wu

IPC分类号： G10L15/00

CPC分类号： G10L15/183

摘要： A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new Language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.

摘要翻译： 一种用于创建与语音识别软件应用一起使用的助记符语言模型的系统和方法，其中所述方法包括生成包含预定义的大量字符的n-gram语言模型，其中所述n-gram语言模型包括至少一个字符从所述预定义的大量字符中，为所述至少一个字符中的每一个构造新的语言模型（LM）令牌，响应于预定义的发音字典提取所述至少一个字符中的每个字符的发音，以获得字符发音表示，创建响应于字符发音表示的至少一个字符中的每一个的至少一个替代发音，以创建替代发音字典并且编译用于语音识别软件应用的n-gram语言模型，其中编译语言模型响应于新的语言模型标记和替代发音词典。

25.

发明申请
Generic spelling mnemonics 失效
标题翻译：通用拼写助记符

公开(公告)号：US20060111907A1

公开(公告)日：2006-05-25

申请号：US10996732

申请日：2004-11-24

申请人： David Mowatt , Robert Chambers , Ciprian Chelba , Qiang Wu

发明人： David Mowatt , Robert Chambers , Ciprian Chelba , Qiang Wu

IPC分类号： G10L15/18

CPC分类号： G10L15/183

摘要： A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.

摘要翻译： 一种用于创建与语音识别软件应用一起使用的助记符语言模型的系统和方法，其中所述方法包括生成包含预定义的大量字符的n-gram语言模型，其中所述n-gram语言模型包括至少一个字符从所述预定义的大量字符中，为所述至少一个字符中的每一个构造新语言模型（LM）令牌，响应于预定义的发音字典提取所述至少一个字符中的每个字符的发音以获得字符发音表示，创建响应于字符发音表示的至少一个字符中的每一个的至少一个替代发音，以创建替代发音字典并且编译用于语音识别软件应用的n-gram语言模型，其中编译语言模型响应于新的语言模型标记和替代发音词典。

26.

发明授权
Back-off language model compression 有权
标题翻译：后退语言模型压缩

公开(公告)号：US08725509B1

公开(公告)日：2014-05-13

申请号：US12486358

申请日：2009-06-17

申请人： Boulos Harb , Ciprian Chelba , Jeffrey A. Dean , Sanjay Ghemawat

发明人： Boulos Harb , Ciprian Chelba , Jeffrey A. Dean , Sanjay Ghemawat

IPC分类号： G10L15/00 , G10L15/06 , G10L15/28 , G06F17/21

CPC分类号： G10L15/183 , G06F17/277

摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to language models stored for digital language processing. In one aspect, a method includes the actions of generating a language model, including: receiving a collection of n-grams from a corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus, and generating a trie representing the collection of n-grams, the trie being represented using one or more arrays of integers, and compressing an array representation of the trie using block encoding; and using the language model to identify a second probability of a particular string of words occurring.

摘要翻译： 方法，系统和装置，包括在计算机存储介质上编码的计算机程序，与存储用于数字语言处理的语言模型有关。一方面，一种方法包括生成语言模型的动作，包括：从语料库接收n-gram的集合，每个n-gram的集合具有在语料库中发生的对应的第一概率，并且生成特征代表n克的集合，使用一个或多个整数数组来表示特里，并使用块编码压缩该特征的阵列表示; 并使用语言模型来识别发生的特定字符串串的第二概率。

27.

发明申请
GENERIC SPELLING MNEMONICS 失效
标题翻译：一般发送的MNEMONICS

公开(公告)号：US20080319749A1

公开(公告)日：2008-12-25

申请号：US12171309

申请日：2008-07-11

申请人： David Mowatt , Robert Chambers , Ciprian Chelba , Qiang Wu

发明人： David Mowatt , Robert Chambers , Ciprian Chelba , Qiang Wu

IPC分类号： G10L15/04

CPC分类号： G10L15/183

摘要： A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new Language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.

摘要翻译： 一种用于创建与语音识别软件应用一起使用的助记符语言模型的系统和方法，其中所述方法包括生成包含预定义的大量字符的n-gram语言模型，其中所述n-gram语言模型包括至少一个字符从所述预定义的大量字符中，为所述至少一个字符中的每一个构造新的语言模型（LM）令牌，响应于预定义的发音字典提取所述至少一个字符中的每个字符的发音，以获得字符发音表示，创建响应于字符发音表示的至少一个字符中的每一个的至少一个替代发音，以创建替代发音字典并且编译用于语音识别软件应用的n-gram语言模型，其中编译语言模型响应于新的语言模型标记和替代发音词典。

28.

发明授权
Representing n-gram language models for compact storage and fast retrieval 有权
标题翻译：代表用于紧凑存储和快速检索的n-gram语言模型

公开(公告)号：US07877258B1

公开(公告)日：2011-01-25

申请号：US11693613

申请日：2007-03-29

申请人： Ciprian Chelba , Thorsten Brants

发明人： Ciprian Chelba , Thorsten Brants

IPC分类号： G10L15/18 , G10L15/06 , G06F17/27

CPC分类号： G06F17/2715 , G06K9/723 , G06K2209/01 , G10L15/197

摘要： Systems, methods, and apparatuses, including computer program products, are provided for representing language models. In some implementations, a computer-implemented method is provided. The method includes generating a compact language model including receiving a collection of n-grams from the corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus and generating a trie representing the collection of n-grams. The method also includes using the language model to identify a second probability of a particular string of words occurring.

摘要翻译： 提供了用于表示语言模型的系统，方法和装置，包括计算机程序产品。在一些实现中，提供了计算机实现的方法。该方法包括生成紧凑语言模型，包括从语料库接收n-gram的集合，每个n-gram的集合具有在语料库中发生的对应的第一概率，并且生成代表n-gram的集合的特里。该方法还包括使用语言模型来识别发生的特定字符串字符串的第二概率。

29.

发明授权
Generic spelling mnemonics 失效
标题翻译：通用拼写助记符

公开(公告)号：US07418387B2

公开(公告)日：2008-08-26

申请号：US10996732

申请日：2004-11-24

申请人： David Mowatt , Robert Chambers , Ciprian Chelba , Qiang Wu

发明人： David Mowatt , Robert Chambers , Ciprian Chelba , Qiang Wu

IPC分类号： G10L15/18

CPC分类号： G10L15/183

摘要： A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.

摘要翻译： 一种用于创建与语音识别软件应用一起使用的助记符语言模型的系统和方法，其中所述方法包括生成包含预定义的大量字符的n-gram语言模型，其中所述n-gram语言模型包括至少一个字符从所述预定义的大量字符中，为所述至少一个字符中的每一个构造新语言模型（LM）令牌，响应于预定义的发音字典提取所述至少一个字符中的每个字符的发音以获得字符发音表示，创建响应于字符发音表示的至少一个字符中的每一个的至少一个替代发音，以创建替代发音字典并且编译用于语音识别软件应用的n-gram语言模型，其中编译语言模型响应于新的语言模型标记和替代发音词典。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类