Method and apparatus for automatically identifying character segments for character recognition
    1.
    发明授权
    Method and apparatus for automatically identifying character segments for character recognition 有权
    用于自动识别字符段用于字符识别的方法和装置

    公开(公告)号:US09014477B2

    公开(公告)日:2015-04-21

    申请号:US13282465

    申请日:2011-10-27

    IPC分类号: G06K9/34

    CPC分类号: G06K9/34 G06K2209/01

    摘要: A method and apparatus for automatically identifying character segments for character recognition is provided. The method involves receiving a plurality of words and a ground truth corresponding to each word of the plurality of words. The plurality of words may be received in a cursive script. Each word of the plurality of words is segmented into one or more character segments based on the ground truth corresponding to each word. Thereafter, the segmentation of each word is refined by iteratively re-segmenting each word based on one or more similar character segments.

    摘要翻译: 提供了一种用于自动识别用于字符识别的字符段的方法和装置。 所述方法包括接收与所述多个单词中的每个单词相对应的多个单词和地面真值。 可以在草书脚本中接收多个单词。 基于与每个单词对应的基本真值,将多个单词中的每个单词分割成一个或多个字符段。 此后,通过基于一个或多个相似字符段迭代地重新分割每个单词来细化每个单词的分割。

    METHOD AND APPARATUS FOR AUTOMATICALLY IDENTIFYING CHARACTER SEGMENTS FOR CHARACTER RECOGNITION
    2.
    发明申请
    METHOD AND APPARATUS FOR AUTOMATICALLY IDENTIFYING CHARACTER SEGMENTS FOR CHARACTER RECOGNITION 审中-公开
    用于自动识别字符识别字符段的方法和装置

    公开(公告)号:US20130108159A1

    公开(公告)日:2013-05-02

    申请号:US13282465

    申请日:2011-10-27

    IPC分类号: G06K9/34

    CPC分类号: G06K9/34 G06K2209/01

    摘要: A method and apparatus for automatically identifying character segments for character recognition is provided. The method involves receiving a plurality of words and a ground truth corresponding to each word of the plurality of words. The plurality of words may be received in a cursive script. Each word of the plurality of words is segmented into one or more character segments based on the ground truth corresponding to each word. Thereafter, the segmentation of each word is refined by iteratively re-segmenting each word based on one or more similar character segments.

    摘要翻译: 提供了一种用于自动识别用于字符识别的字符段的方法和装置。 所述方法包括接收与所述多个单词中的每个单词相对应的多个单词和地面真值。 可以在草书脚本中接收多个单词。 基于与每个单词对应的基本真值,将多个单词中的每个单词分割成一个或多个字符段。 此后,通过基于一个或多个相似字符段迭代地重新分割每个单词来细化每个单词的分割。

    Template-based cursive handwriting recognition
    3.
    发明申请
    Template-based cursive handwriting recognition 失效
    基于模板的草书手写识别

    公开(公告)号:US20050100217A1

    公开(公告)日:2005-05-12

    申请号:US10702663

    申请日:2003-11-07

    IPC分类号: G06K9/18 G06K9/68

    CPC分类号: G06K9/6835

    摘要: Input handwritten characters are classified as print or cursive based upon numerical feature values calculated from the shape of an input character. The feature values are applied to inputs of an artificial neural network which outputs a probability of the input character being print or cursive. If a character is classified as print, it is analyzed by a print character recognizer. If a character is classified as cursive, it is analyzed using a cursive character recognizer. The cursive character recognizer compares the input character to multiple prototype characters using a Dynamic Time Warping (DTW) algorithm.

    摘要翻译: 基于从输入字符的形状计算的数值特征值,输入手写字符被分类为打印或草书。 特征值被应用于人造神经网络的输入,其输出输入字符是打印或草书的概率。 如果字符被分类为打印,则由打印字符识别器进行分析。 如果角色被分类为草书,则使用草书字符识别器进行分析。 草书字符识别器使用动态时间扭曲(DTW)算法将输入字符与多个原型字符进行比较。

    Efficient identification and correction of optical character recognition errors through learning in a multi-engine environment
    4.
    发明授权
    Efficient identification and correction of optical character recognition errors through learning in a multi-engine environment 有权
    通过在多引擎环境中学习,有效识别和校正光学字符识别错误

    公开(公告)号:US08331739B1

    公开(公告)日:2012-12-11

    申请号:US12357367

    申请日:2009-01-21

    摘要: OCR errors are identified and corrected through learning. An error probability estimator is trained using ground truths to learn error probability estimation. Multiple OCR engines process a text image, and convert it into texts. The error probability estimator compares the outcomes of the multiple OCR engines for mismatches, and determines an error probability for each of the mismatches. If the error probability of a mismatch exceeds an error probability threshold, a suspect is generated and grouped together with similar suspects in a cluster. A question for the cluster is generated and rendered to a human operator for answering. The answer from the human operator is then applied to all suspects in the cluster to correct OCR errors in the resulting text. The answer is also used to further train the error probability estimator.

    摘要翻译: 通过学习识别和纠正OCR错误。 使用地面真值训练误差概率估计器来学习误差概率估计。 多个OCR引擎处理文本图像,并将其转换为文本。 误差概率估计器比较多个OCR引擎的不匹配结果,并确定每个错配的错误概率。 如果不匹配的错误概率超过错误概率阈值,则生成一个疑犯并将其与群集中的类似嫌疑人分组。 生成集群的问题并将其呈现给操作人员进行应答。 然后将人类操作员的答案应用于群集中的所有疑犯,以纠正所得文本中的OCR错误。 答案也用于进一步训练误差概率估计器。

    Classifying Results of Search Queries
    6.
    发明申请
    Classifying Results of Search Queries 有权
    分类搜索查询结果

    公开(公告)号:US20120158702A1

    公开(公告)日:2012-06-21

    申请号:US12969140

    申请日:2010-12-15

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30303

    摘要: Computer-readable media, computer systems, and computing methods are provided for classifying search results as either of good quality or of poor quality. Initially, a portion of the search results, such as the highest ranked documents, are selected for evaluation. A level of quality for each of the selected search results is determined using a classification process that includes the following steps: targeting features demonstrated by the selected search results to be evaluated; evaluating the selected features to generate a level-of-quality score for each of the selected search results; comparing the score against a predefined threshold value; and, based on the comparison, assigning each of the selected search results an absolute measurement. The absolute measurement indicates poor quality when the score is less than the threshold value. Upon recognizing that the selected search results are of poor quality, automatically executing a corrective action that reformulates the issued search query.

    摘要翻译: 提供计算机可读介质,计算机系统和计算方法,用于将搜索结果分类为质量好或品质差。 最初,选择搜索结果的一部分,例如最高排名的文档进行评估。 使用包括以下步骤的分类处理来确定每个所选择的搜索结果的质量水平:针对要被评估的所选择的搜索结果演示的目标特征; 评估所选择的特征以为每个所选择的搜索结果生成质量等级; 将分数与预定义的阈值进行比较; 并且基于比较,将所选择的搜索结果中的每一个分配给绝对测量。 当分数小于阈值时,绝对测量值表示质量差。 在识别出所选择的搜索结果质量差的情况下,自动执行重新发布的搜索查询的纠正措施。

    Recognition of mathematical expressions
    7.
    发明授权
    Recognition of mathematical expressions 有权
    数学表达式的识别

    公开(公告)号:US08009915B2

    公开(公告)日:2011-08-30

    申请号:US11788190

    申请日:2007-04-19

    IPC分类号: G06K9/00

    摘要: In embodiments consistent with the subject matter of this disclosure, a user may input strokes as digital ink to a processing device. The processing device may partition the input strokes into multiple regions of strokes. A first recognizer and a second recognizer may score grammar objects included in regions and represented by chart entries. The scores may be converted to a converted score, which may have at least a near standard normal distribution. The processing device may present a recognition result based on highest converted scores according to a recurrence formula. The processing device may receive a correction hint with respect to misrecognized strokes and may add a penalty score with respect to chart entries representing grammar objects breaking the correction hint. Incremental recognition may be performed when a pause is detected during inputting of strokes.

    摘要翻译: 在与本公开的主题相一致的实施例中,用户可以将笔画作为数字墨水输入到处理设备。 处理装置可以将输入笔划划分成多个笔画区域。 第一识别器和第二识别器可以对包括在区域中的语法对象进行评分并由图表条目表示。 得分可以转换成转换得分,其可以具有至少近标准正态分布。 处理装置可以根据递归公式提供基于最高转换分数的识别结果。 处理装置可以接收关于错误识别的笔画的校正提示,并且可以相对于表示打破校正提示的语法对象的图表条目添加惩罚分数。 当在笔画输入期间检测到暂停时,可以执行增量识别。

    Allograph based writer adaptation for handwritten character recognition
    8.
    发明申请
    Allograph based writer adaptation for handwritten character recognition 有权
    基于笔记本的作家适应手写字符识别

    公开(公告)号:US20070140561A1

    公开(公告)日:2007-06-21

    申请号:US11305968

    申请日:2005-12-19

    IPC分类号: G06K9/00

    摘要: The claimed subject matter provides a system and/or a method that facilitates analyzing and/or recognizing a handwritten character. An interface component can receive at least one handwritten character. A personalization component can train a classifier based on an allograph related to a handwriting style to provide handwriting recognition for the at least one handwritten character. In addition, the personalization component can employ any suitable combiner to provide optimized recognition.

    摘要翻译: 所要求保护的主题提供了便于分析和/或识别手写字符的系统和/或方法。 接口组件可以接收至少一个手写字符。 个性化组件可以基于与手写风格相关的笔记本来训练分类器,以提供至少一个手写字符的手写识别。 此外,个性化组件可以使用任何合适的组合器来提供优化的识别。

    User-initiated reporting of handwriting recognition errors over the internet
    9.
    发明申请
    User-initiated reporting of handwriting recognition errors over the internet 审中-公开
    用户发起的通过互联网报告手写识别错误

    公开(公告)号:US20060285749A1

    公开(公告)日:2006-12-21

    申请号:US11154650

    申请日:2005-06-17

    IPC分类号: G06K9/00 G09G5/00

    摘要: A user may initiate or confirm a process for reporting errors in handwriting recognition errors in a computer system. A user dialog is provided in which a user may select handwriting recognition errors to report and report the selected handwriting recognition errors via a handwriting recognition error report. The report may include selected handwriting recognition errors including ink samples, recognized text, corrected text and status. The handwriting recognition errors may further be categorized based on multiple parameters. The user may also include comments with the report.

    摘要翻译: 用户可以启动或确认用于报告计算机系统中的手写识别错误的错误的过程。 提供了一种用户对话,其中用户可以通过手写识别错误报告来选择手写识别错误来报告和报告所选择的手写识别错误。 该报告可能包括所选择的手写识别错误,包括墨水样本,识别的文本,更正的文本和状态。 手写识别错误可以进一步基于多个参数进行分类。 用户还可以在报告中包含评论。

    Classifying results of search queries
    10.
    发明授权
    Classifying results of search queries 有权
    分类搜索查询结果

    公开(公告)号:US09251185B2

    公开(公告)日:2016-02-02

    申请号:US12969140

    申请日:2010-12-15

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30303

    摘要: Computer-readable media, computer systems, and computing methods are provided for classifying search results as either of good quality or of poor quality. Initially, a portion of the search results, such as the highest ranked documents, are selected for evaluation. A level of quality for each of the selected search results is determined using a classification process that includes the following steps: targeting features demonstrated by the selected search results to be evaluated; evaluating the selected features to generate a level-of-quality score for each of the selected search results; comparing the score against a predefined threshold value; and, based on the comparison, assigning each of the selected search results an absolute measurement. The absolute measurement indicates poor quality when the score is less than the threshold value. Upon recognizing that the selected search results are of poor quality, automatically executing a corrective action that reformulates the issued search query.

    摘要翻译: 提供计算机可读介质,计算机系统和计算方法,用于将搜索结果分类为质量好或品质差。 最初,选择搜索结果的一部分,例如最高排名的文档进行评估。 使用包括以下步骤的分类处理来确定每个所选择的搜索结果的质量水平:针对要被评估的所选择的搜索结果演示的目标特征; 评估所选择的特征以为每个所选择的搜索结果生成质量等级; 将分数与预定义的阈值进行比较; 并且基于比较,将所选择的搜索结果中的每一个分配给绝对测量。 当分数小于阈值时,绝对测量值表示质量差。 在识别出所选择的搜索结果质量差的情况下,自动执行重新发布的搜索查询的纠正措施。