-
公开(公告)号:US08521516B2
公开(公告)日:2013-08-27
申请号:US12411224
申请日:2009-03-25
IPC分类号: G06F17/21
CPC分类号: G06F17/27
摘要: Systems, methods, and apparatuses including computer program products are provided for training machine learning systems. In some implementations, a method is provided. The method includes receiving a collection of phrases, normalizing a plurality of phrases of the collection of phrases, the normalizing being based at least in part on lexicographic normalizing rules, and generating a normalized phrase table including a plurality of key-value pairs, each key value pair includes a key corresponding to a normalized phrase and a value corresponding to one or more un-normalized phrases associated with the normalized key, each un-normalized phrase having one or more parameters.
摘要翻译: 提供包括计算机程序产品在内的系统,方法和设备用于训练机器学习系统。 在一些实现中,提供了一种方法。 该方法包括接收短语集合,归一化短语集合中的多个短语,归一化至少部分地基于词典标准化规则,以及生成包括多个键值对的标准化短语表,每个键 值对包括对应于归一化短语的键和对应于与归一化键相关联的一个或多个非标准化短语的值,每个非正规化短语具有一个或多个参数。
-
公开(公告)号:US20130151235A1
公开(公告)日:2013-06-13
申请号:US12411224
申请日:2009-03-25
IPC分类号: G06F17/27
CPC分类号: G06F17/27
摘要: Systems, methods, and apparatuses including computer program products are provided for training machine learning systems. In some implementations, a method is provided. The method includes receiving a collection of phrases, normalizing a plurality of phrases of the collection of phrases, the normalizing being based at least in part on lexicographic normalizing rules, and generating a normalized phrase table including a plurality of key-value pairs, each key value pair includes a key corresponding to a normalized phrase and a value corresponding to one or more un-normalized phrases associated with the normalized key, each un-normalized phrase having one or more parameters.
摘要翻译: 提供包括计算机程序产品在内的系统,方法和设备用于训练机器学习系统。 在一些实现中,提供了一种方法。 该方法包括接收短语集合,归一化短语集合中的多个短语,归一化至少部分地基于词典标准化规则,以及生成包括多个键值对的标准化短语表,每个键 值对包括对应于归一化短语的键和对应于与归一化键相关联的一个或多个非标准化短语的值,每个非正规化短语具有一个或多个参数。
-
公开(公告)号:US08953885B1
公开(公告)日:2015-02-10
申请号:US13617710
申请日:2012-09-14
IPC分类号: G06K9/34
CPC分类号: G06K9/723 , G06K9/50 , G06K2209/01
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing optical character recognition. In one aspect, a method includes receiving a text image I. A set of feature functions are evaluated for a log linear model to determine respective feature values for the text image I, wherein each feature function hi maps the text image I to a feature value, and wherein each feature function hi is associated with a respective feature weight λi. A transcription {circumflex over (T)} is determined that minimizes a cost of the log linear model.
摘要翻译: 方法,系统和装置,包括在计算机存储介质上编码的用于执行光学字符识别的计算机程序。 一方面,一种方法包括接收文本图像I.对于对数线性模型评估一组特征函数,以确定文本图像I的各个特征值,其中每个特征函数hi将文本图像I映射到特征值 ,并且其中每个特征函数hi与相应的特征权重λi相关联。 确定一个记录(T)}的转录,使日志线性模型的成本最小化。
-
公开(公告)号:US08626486B2
公开(公告)日:2014-01-07
申请号:US11850623
申请日:2007-09-05
申请人: Franz J. Och , Dmitriy Genzel
发明人: Franz J. Och , Dmitriy Genzel
IPC分类号: G06F17/28
CPC分类号: G06F17/28 , G06F17/273
摘要: Methods, systems, and apparatus, including computer program products, for correcting spelling in text. A text input is received for translation. One or more suspect words in the text input are identified. For each suspect word, one or more candidate words are identified. A score for the text input and scores for each of one or more candidate inputs are determined, where each candidate input is the text input with one or more of the suspect words each replaced by a respective candidate word. If any, a candidate input whose score is highest among the scores for the candidate inputs and is greater than the text input score by at least a threshold is selected. Otherwise, the text input is selected. A translation of a selected candidate input or the selected text input is provided as the translation of the text input.
摘要翻译: 方法,系统和设备,包括计算机程序产品,用于纠正文本拼写。 收到文本输入进行翻译。 识别文本输入中的一个或多个可疑词。 对于每个嫌疑词,识别一个或多个候选词。 确定文本输入的分数和一个或多个候选输入中的每一个的分数,其中每个候选输入是文本输入,其中一个或多个可疑单词各自被相应的候选词替换。 如果有的话,选择在候选输入的分数中得分最高并且大于文本输入得分至少一个阈值的候选输入。 否则,选择文本输入。 提供所选择的候选输入或所选择的文本输入的翻译作为文本输入的翻译。
-
公开(公告)号:US20130144592A1
公开(公告)日:2013-06-06
申请号:US11850623
申请日:2007-09-05
申请人: Franz J. Och , Dmitriy Genzel
发明人: Franz J. Och , Dmitriy Genzel
IPC分类号: G06F17/28
CPC分类号: G06F17/28 , G06F17/273
摘要: Methods, systems, and apparatus, including computer program products, for correcting spelling in text. A text input is received for translation. One or more suspect words in the text input are identified. For each suspect word, one or more candidate words are identified. A score for the text input and scores for each of one or more candidate inputs are determined, where each candidate input is the text input with one or more of the suspect words each replaced by a respective candidate word. If any, a candidate input whose score is highest among the scores for the candidate inputs and is greater than the text input score by at least a threshold is selected. Otherwise, the text input is selected. A translation of a selected candidate input or the selected text input is provided as the translation of the text input.
摘要翻译: 方法,系统和设备,包括计算机程序产品,用于纠正文本拼写。 收到文本输入进行翻译。 识别文本输入中的一个或多个可疑词。 对于每个嫌疑词,识别一个或多个候选词。 确定文本输入的分数和一个或多个候选输入中的每一个的分数,其中每个候选输入是文本输入,其中一个或多个可疑单词各自被相应的候选词替换。 如果有的话,选择在候选输入的分数中得分最高并且大于文本输入得分至少一个阈值的候选输入。 否则,选择文本输入。 提供所选择的候选输入或所选择的文本输入的翻译作为文本输入的翻译。
-
-
-
-