Post-processing system and method for correcting machine recognized text
    1.
    发明授权
    Post-processing system and method for correcting machine recognized text 失效
    用于校正机器识别文本的后处理系统和方法

    公开(公告)号:US07092567B2

    公开(公告)日:2006-08-15

    申请号:US10288645

    申请日:2002-11-04

    CPC分类号: G06K9/723 G06K2209/01

    摘要: A method of post-processing character data from an optical character recognition (OCR) engine and apparatus to perform the method. This exemplary method includes segmenting the character data into a set of initial words. The set of initial words is word level processed to determine at least one candidate word corresponding to each initial word. The set of initial words is segmented into a set of sentences. Each sentence in the set of sentences includes a plurality of initial words and candidate words corresponding to the initial words. A sentence is selected from the set of sentences. The selected sentence is word disambiguity processed to determine a plurality of final words. A final word is selected from the at least one candidate word corresponding to a matching initial word. The plurality of final words is then assembled as post-processed OCR data.

    摘要翻译: 一种后处理来自光学字符识别(OCR)引擎和装置的字符数据的方法。 该示例性方法包括将字符数据分割成一组初始字。 初始字的集合被处理为字处理以确定与每个初始字对应的至少一个候选字。 该组初始单词被分割成一组句子。 该组句子中的每个句子包括与初始词对应的多个初始词和候选词。 从一组句子中选出一个句子。 所选择的句子是处理的词消除歧义以确定多个最终词。 从对应于匹配的初始字的至少一个候选字中选择最终字。 然后将多个最终单词组装为后处理OCR数据。