Automatic training of layout parameters in a 2D image model
    1.
    发明授权
    Automatic training of layout parameters in a 2D image model 失效
    在2D图像模型中自动训练布局参数

    公开(公告)号:US06687404B1

    公开(公告)日:2004-02-03

    申请号:US08880137

    申请日:1997-06-20

    IPC分类号: G06K900

    CPC分类号: G06K9/00442

    摘要: A two-dimensional (2D) image model models the layout structure of a class of document images as an image grammar and includes production rules having explicit layout parameters as data items that indicate information about the spatial relationships among image constituents occurring in images included in the class. The parameters are explicitly represented in the grammar rules in a manner that permits them to be automatically trained by a training operation that makes use of sample document images from the class of modeled documents. After each sample image is aligned with the 2D grammar, document-specific measurements about the spatial relationships between image constituents are taken from the image. Optimal values for the layout parameters are then computed from the measurement data collected from all samples. An illustrated implementation of the 2D image model takes the form of a stochastic context-free attribute grammar in which synthesized and inherited attributes and synthesis and inheritance functions are associated with each production rule in the grammar. The attributes indicate physical spatial locations of image constituents in the image, and a set of parameterized functions, in which the coefficients are the layout parameters, compute the attributes as a function of a characteristic of an image constituent of the production rule. The measurement data is taken from an annotated parse tree produced for each training image by the grammar. A trained grammar can then be used, for example, for document recognition and layout analysis operations on any document in the class of documents modeled by the grammar.

    摘要翻译: 二维(2D)图像模型将一类文档图像的布局结构模型化为图像语法,并且包括具有显式布局参数的生产规则作为数据项,其指示关于在包含在图像中的图像中出现的图像成分之间的空间关系的信息 类。 这些参数以语法规则明确表示,允许通过使用来自建模文档类的样本文档图像的训练操作来自动训练这些参数。 在每个样本图像与2D语法对齐后,从图像中获取关于图像成分之间的空间关系的文献特异性测量。 然后根据从所有样本收集的测量数据计算布局参数的最优值。 2D图像模型的说明性实现采用随机上下文无关属性语法的形式,其中合成和继承的属性和合成和继承函数与语法中的每个生产规则相关联。 属性表示图像中图像成分的物理空间位置,以及一组参数化函数,其中系数是布局参数,根据生产规则的图像成分的特征计算属性。 测量数据取自用语法为每个训练图像生成的注释分析树。 然后,可以使用经过训练的语法,例如,用于由语法建模的文档类中的任何文档的文档识别和布局分析操作。

    Document image decoding using modified branch-and-bound methods
    2.
    发明授权
    Document image decoding using modified branch-and-bound methods 失效
    使用修改的分支和绑定方法的文档图像解码

    公开(公告)号:US5526444A

    公开(公告)日:1996-06-11

    申请号:US60196

    申请日:1993-05-07

    摘要: An image decoding and recognition system and method comprising a fast heuristic algorithm using hidden Markov models (HMM). The new search algorithm, called an "iterative complete path" (ICP) algorithm, patterned after well-known branch-and-bound (B&B) methods, significantly reduces the complexity and improves the speed of HMM image decoding without sacrificing the optimality of the straightforward procedure. An advantageous form of the heuristic functions which is useful in applying the ICP algorithm to text-like images is described. The ICP algorithm is directly applicable to the separable type of finite-state source models. Also disclosed is a technique for transforming more general source models into such a separable form.

    摘要翻译: 一种使用隐马尔可夫模型(HMM)的快速启发式算法的图像解码识别系统和方法。 称为“迭代完整路径”(ICP)算法的新型搜索算法,在公知的分支绑定(B&B)方法之后进行图案化,显着降低了复杂度并提高了HMM图像解码的速度,而不牺牲 简单的程序。 描述了将ICP算法应用于文字图像中有用的启发式函数的有利形式。 ICP算法直接适用于可分离类型的有限状态源模型。 还公开了一种用于将更一般的源模型转换成这种可分离形式的技术。

    Method and system for automatic transcription correction
    3.
    发明授权
    Method and system for automatic transcription correction 失效
    自动转录修正的方法和系统

    公开(公告)号:US5883986A

    公开(公告)日:1999-03-16

    申请号:US460454

    申请日:1995-06-02

    IPC分类号: G06K9/72 G06K9/03 G06K9/68

    CPC分类号: G06K9/72 G06K2209/01

    摘要: A method and system for automatically modifying an original transcription produced as the output of a recognition operation produces a second, modified transcription, such as, for example, automatically correcting an errorful transcription produced by an OCR operation. The invention uses information in an input text image of character images and in an original transcription associated with the input text image to modify aspects of a formal image source model that models as a grammar the spatial image structure of a set of text images. A recognition operation is then performed on the input text image using the modified formal image source model to produce a second, modified transcription. When the original transcription is errorful, the second transcription is a corrected transcription. Several aspects of the formal image source model may be modified; in particular, character templates to be used in the recognition operation are trained in the font of the glyphs occurring in the input text image. When errors in the original transcription are caused by matching glyphs against templates that are inadequately specified for the given input text image, the subsequently performed recognition operation on the text image using the trained, font-specific character templates produces a more accurate transcription.

    摘要翻译: 用于自动修改作为识别操作的输出产生的原始转录的方法和系统产生第二修改的转录,例如自动校正由OCR操作产生的错误转录。 本发明使用字符图像的输入文本图像中的信息和与输入文本图像相关联的原始转录中的信息,以修改将文本图像的空间图像结构建模成语法的形式图像源模型的各个方面。 然后使用修改的形式图像源模型对输入文本图像执行识别操作,以产生第二个修改的转录。 当原始转录错误时,第二次转录是经过校正的转录。 可以修改形式图像源模型的几个方面; 特别地,在识别操作中使用的字符模板以在输入文本图像中出现的字形的字体进行训练。 当原始转录中的错误是由匹配的字形与针对给定的输入文本图像未充分规定的模板引起时,使用训练有素的字体特定字符模板对文本图像进行后续执行的识别操作会产生更准确的转录。

    Image recognition method using finite state networks
    4.
    发明授权
    Image recognition method using finite state networks 失效
    使用有限状态网络的图像识别方法

    公开(公告)号:US5321773A

    公开(公告)日:1994-06-14

    申请号:US805700

    申请日:1991-12-10

    摘要: An image recognition system, in particular for document image recognition, using an imaging model employing a 2-dimensional finite state automaton corresponding to a regular string grammar. This approach is not only less computationally intensive than previous grammar-based approaches to document image recognition, but also can handle a wider variety of image types. Features of the imaging model include a sidebearing model of glyph positioning, an image decoder based on linear scheduling theory for regular interative algorithms, the combining of overlapping image sub-regions, and a least-squares estimation procedure for measuring character parameters from character samples in the image.

    摘要翻译: 一种图像识别系统,特别是用于文件图像识别的图像识别系统,其使用采用对应于常规字符串语法的二维有限状态自动机的成像模型。 这种方法不仅比以前基于语法的文档图像识别方法的计算密集程度更低,而且还可以处理更多种类的图像类型。 成像模型的特点包括:一个侧边的字形定位模型,一种基于常规交互算法的线性调度理论的图像解码器,重叠图像子区域的组合,以及用于从字符样本测量角色参数的最小二乘估计过程 图片。

    Automatic training of character templates using a text line image, a
text line transcription and a line image source model
    5.
    发明授权
    Automatic training of character templates using a text line image, a text line transcription and a line image source model 失效
    使用文本行图像,文本行转录和线图像源模型自动训练角色模板

    公开(公告)号:US5594809A

    公开(公告)日:1997-01-14

    申请号:US431253

    申请日:1995-04-28

    IPC分类号: G06K9/68 G06K9/62

    CPC分类号: G06K9/6297 G06K2209/01

    摘要: A technique for automatically producing, or training, a set of bitmapped character templates defined according to the sidebearing model of character image positioning uses as input a text line image of unsegmented characters, called glyphs, as the source of training samples. The training process also uses a transcription associated with the text line image, and an explicit, grammar-based text line image source model that describes the structural and functional features of a set of possible text line images that may be used as the source of training samples. The transcription may be a literal transcription of the line image, or it may be nonliteral, for example containing logical structure tags for document formatting and layout, such as found in markup languages. Spatial positioning information modeled by the text line image source model and the labels in the transcription are used to determine labeled image positions identifying the location of glyph samples occurring in the input line image, and the character templates are produced using the labeled image positions. In another aspect of the technique, a set of character templates defined by any character template model, such as a segmentation-based model, is produced using the grammar-based text line image source model and specifically using a tag transcription containing logical structure tags for document formatting and layout. Both aspects of the training technique may represent the text line image source model and the transcription as finite state networks.

    摘要翻译: 用于自动生成或训练根据人物图像定位的侧边模型定义的一组位图字符模板的技术用作输入作为训练样本的来源的未分割字符的文本行图像,称为字形。 训练过程还使用与文本行图像相关联的转录,以及基于语法的基于语法的文本线图像源模型,其描述可用作训练来源的一组可能的文本线图像的结构和功能特征 样品。 转录可以是行图像的文字转录,或者可以是非标准的,例如包含用于文档格式化和布局的逻辑结构标签,例如在标记语言中找到的。 由文本线图像源模型建立的空间定位信息和转录中的标签用于确定标识图像位置,标识出现在输入线图像中的字形样本的位置,并且使用标记的图像位置产生字符模板。 在该技术的另一方面,使用基于语法的文本行图像源模型来产生由任何字符模板模型(例如基于分割的模型)定义的一组字符模板,并且具体使用包含逻辑结构标签的标签转录 文件格式和布局。 训练技术的两个方面可以表示文本行图像源模型和转录为有限状态网络。

    Automatic training of character templates using a transcription and a
two-dimensional image source model
    6.
    发明授权
    Automatic training of character templates using a transcription and a two-dimensional image source model 失效
    使用转录和二维图像源模型自动训练角色模板

    公开(公告)号:US5689620A

    公开(公告)日:1997-11-18

    申请号:US431223

    申请日:1995-04-28

    CPC分类号: G06K9/6256

    摘要: A technique for automatically training a set of character templates using unsegmented training samples uses as input a two-dimensional (2D) image of characters, called glyphs, as the source of training samples, a transcription associated with the 2D image as a source of labels for the glyph samples, and an explicit, formal 2D image source model that models as a grammar the structural and functional features of a set of 2D images that may be used as the source of training data. The input transcription may be a literal transcription associated with the 2D input image, or it may be nonliteral, for example containing logical structure tags for document formatting, such as found in markup languages. The technique uses spatial positioning information about the 2D image modeled by the 2D image source model and uses labels in the transcription to determine labeled glyph positions in the 2D image that identify locations of glyph samples. The character templates are produced using the input 2D image and the labeled glyph positions without assigning pixels to glyph samples prior to training. In one implementation, the 2D image source model is a regular grammar having the form of a finite state transition network, and the transcription is also represented as a finite state network. The two networks are merged to produce a transcription-image network, which is used to decode the input 2D image to produce labeled glyph positions that identify training data samples in the 2D image. In one implementation of the template construction process, a pixel scoring technique is used to produce character templates contemporaneously from blocks of training data samples aligned at glyph positions.

    摘要翻译: 用于使用未分段训练样本自动训练一组角色模板的技术将作为训练样本的来源的称为字形的二维(2D)图像的字符(2D)用作输入,与2D图像相关联的转录作为标签的来源 对于字形样本,以及一个明确的,正式的2D图像源模型,其将模型化为可以用作训练数据源的一组2D图像的结构和功能特征作为语法。 输入转录可以是与2D输入图像相关联的文字转录,或者它可以是非标准的,例如包含用于文档格式化的逻辑结构标签,例如以标记语言找到的。 该技术使用关于由2D图像源模型建模的2D图像的空间定位信息,并使用转录中的标签来确定2D图像中识别字形样本位置的标记字形位置。 使用输入的2D图像和标记的字形位置产生字符模板,而不在训练之前将像素分配给字形样本。 在一个实现中,2D图像源模型是具有有限状态转换网络形式的规则语法,并且转录也被表示为有限状态网络。 两个网络被合并以产生转录图像网络,其用于解码输入的2D图像以产生识别2D图像中的训练数据样本的标记的字形位置。 在模板构建过程的一个实现中,使用像素评分技术从与字形位置对齐的训练数据样本的块同时产生字符模板。

    Unsupervised training of character templates using unsegmented samples
    7.
    发明授权
    Unsupervised training of character templates using unsegmented samples 失效
    使用未分段样本的角色模板的无监督训练

    公开(公告)号:US5956419A

    公开(公告)日:1999-09-21

    申请号:US430635

    申请日:1995-04-28

    IPC分类号: G06K9/62 G06K9/68

    CPC分类号: G06K9/68 G06K9/6256

    摘要: A method for operating a machine to perform unsupervised training of a set of character templates uses as the source of training samples an image source of character images, called glyphs, that need not be manually or automatically segmented or isolated prior to training. A recognition operation performed on the image source of character images produces a labeled glyph position data structure that includes, for each glyph in the image source, a glyph image position in the image source associating an estimated image location of the glyph in the image source with a character label paired with the glyph image position that indicates the character in the character set being trained. The labeled glyph position data and the image source are then used to determine sample image regions in the image source; each sample image region is large enough to contain at least a single glyph but need not be restricted in size to only contain a single glyph. The template construction process using unsegmented samples is mathematically modeled as an optimization problem that optimizes a function that represents the set of character templates being trained as an ideal image to be reconstructed to match the input image. The method produces all of the character templates substantially contemporaneously by using a novel pixel scoring technique that implements an approximation of a maximum likelihood criterion subject to a constraint on the templates produced which holds that foreground pixels in adjacently positioned character images have substantially nonoverlapping foreground pixels. The character templates produced may be binary templates or arrays of probability values.

    摘要翻译: 用于操作机器执行一组字符模板的无监督训练的方法用作训练的来源,在训练之前不需要手动地或自动地分割或分离字符图像的称为字形的图像源。 对字符图像的图像源执行的识别操作产生标记的字形位置数据结构,其包括对于图像源中的每个字形,图像源中的字形图像位置将图像源中的字形的估计图像位置与 与字形图像位置配对的字符标签,其指示被训练的字符集中的字符。 然后使用标记的字形位置数据和图像源来确定图像源中的样本图像区域; 每个样本图像区域足够大以至少包含单个字形,但不需要将其限制为仅包含单个字形。 使用未分段样本的模板构建过程在数学上被建模为优化问题,其优化表示被训练为被重建以匹配输入图像的理想图像的字符模板集合的函数。 该方法通过使用新颖的像素评分技术来产生所有的字符模板,该新颖的像素评分技术实现对所产生的模板的约束的最大似然准则的近似,该模板保持相邻位置的字符图像中的前景像素具有基本上不重叠的前景像素。 生成的字符模板可以是二进制模板或概率值数组。

    Editing text in an image
    8.
    发明授权
    Editing text in an image 失效
    编辑图像中的文字

    公开(公告)号:US5548700A

    公开(公告)日:1996-08-20

    申请号:US39553

    申请日:1993-03-29

    IPC分类号: G06T1/00 G06T11/60 G06F17/00

    CPC分类号: G06T11/60

    摘要: Character level text editing is performed on an image without recognizing characters, by operating on a character-size array obtained from a two-dimensional array defining an image region. A processor, in response to a request for a text editing operation, accesses an edit data structure that includes the image region array and performs the operation. The character-size array is obtained by dividing the image region array when necessary. An image region array that includes more than one line is divided along interline spaces. An image region array that includes one line is divided along intercharacter spaces. Character-size arrays are divided out of larger arrays by finding connected component bounding boxes, and then determining from the bounding boxes whether the connected components are likely to form a character. If so, the connected components are used to obtain the character-size array and spatial data about position, size, and shape of the character. Smaller arrays and spatial data can replace a larger array in the edit data structure. Smaller arrays are obtained only as necessary to perform a requested text editing operation, and if the edit data structure is not otherwise modified, obtaining a smaller array does not necessitate redrawing of the display. In addition to character level editing, a text editing operation can be performed on a sequence of arrays, such as a word, line, or a sequence that begins on one line and ends on another. The spatial data can be used to position arrays after insertion or deletion, to advance a cursor through the text, and to justify a line of arrays. A character-size array can be assigned to a keyboard key, and the key may then be used to insert that array into the text or to request a search for other arrays matching that array.

    摘要翻译: 通过对从定义图像区域的二维阵列获得的字符尺寸阵列进行操作,对图像执行字符级文本编辑,而不识别字符。 响应于文本编辑操作的请求,处理器访问包括图像区域阵列的编辑数据结构并执行该操作。 字符大小的阵列是必要时划分图像区域阵列获得的。 包含多条线的图像区域阵列沿着行间空间分割。 包括一条线的图像区域阵列沿着字符间隔被分割。 通过查找连接的组件边界框,从更大的数组中划分字符大小的数组,然后从边界框确定连接的组件是否可能形成一个字符。 如果是这样,连接的组件用于获取关于字符的位置,大小和形状的字符大小数组和空间数据。 较小的阵列和空间数据可以替代编辑数据结构中较大的阵列。 只有在执行所请求的文本编辑操作所需的时候才能获得较小的数组,并且如果编辑数据结构没有被修改,则获得较小的数组不需要重新绘制显示。 除了字符级编辑之外,还可以对一系列的数组执行文本编辑操作,例如字符,行或从一行开始并以另一行开头的序列。 空间数据可以用于在插入或删除之后对数组进行定位,以使光标在文本中前进,并对齐一行数组。 字符大小的数组可以分配给键盘键,然后可以使用键将该数组插入文本或请求搜索与该数组匹配的其他数组。

    Method for aligning a text image to a transcription of the image
    9.
    发明授权
    Method for aligning a text image to a transcription of the image 失效
    将文本图像与图像转录对齐的方法

    公开(公告)号:US5689585A

    公开(公告)日:1997-11-18

    申请号:US431004

    申请日:1995-04-28

    IPC分类号: G06K9/20 G06K9/72

    摘要: A method for establishing a relationship between a text image and a transcription associated with the text image uses conventional image processing techniques to identify one or more geometric attributes, or image parameters, of each of a sequence of regions of the text image. The transcription labels in the transcription are analyzed to determine a comparable set of parameters in transcription label sequence. A matching operation then matches the respective parameters of the two sequences to identify image regions that match with transcription regions. The result is an output data structure that minimally identifies image locations of interest to a subsequent operation that processes the text image. The output data structure may also pair each of the image locations of interest to a transcription location, in effect producing a set of labeled image locations. In one embodiment, the sequence of locations of words and their observed lengths in the text image are determined. The transcription is analyzed to identify words, and transcription word lengths are computed using an estimated image character width of glyphs in the text image. The sequence of observed image word lengths is then matched to the sequence of computed transcription word lengths using a dynamic programming algorithm that finds a best path through a two-dimensional lattice of nodes and transitions between nodes, where the transitions represent pairs of sequences of zero or more word lengths. An output data structure contains entries, each of which pairs a transcription word with a matching image word location.

    摘要翻译: 用于建立文本图像与与文本图像相关联的转录之间的关系的方法使用常规图像处理技术来识别文本图像的区域序列中的每一个的一个或多个几何属性或图像参数。 分析转录中的转录标记以确定转录标记序列中可比较的一组参数。 匹配操作然后匹配两个序列的相应参数以识别与转录区域匹配的图像区域。 结果是输出数据结构,其最小程度地识别处理文本图像的后续操作感兴趣的图像位置。 输出数据结构还可以将感兴趣的每个图像位置配对到转录位置,实际上产生一组标记的图像位置。 在一个实施例中,确定单词的位置序列及其在文本图像中的观察长度。 分析转录以识别词,并且使用文本图像中的字形的估计图像字符宽度来计算转录词长度。 然后使用动态规划算法将观察到的图像字长度的序列与计算出的转录词长度的序列匹配,该动态规划算法通过节点的二维网格和节点之间的转换找到最佳路径,其中,转换代表零序列对 或更多字长。 输出数据结构包含条目,每个条目将转录词与匹配的图像字位置配对。

    Method of producing character templates using unsegmented samples
    10.
    发明授权
    Method of producing character templates using unsegmented samples 失效
    使用未分段样本生成角色模板的方法

    公开(公告)号:US5706364A

    公开(公告)日:1998-01-06

    申请号:US431714

    申请日:1995-04-28

    IPC分类号: G06K9/62 G06R9/62

    CPC分类号: G06K9/6255

    摘要: A method for producing, or training, a set of character templates uses as the source of training samples an image source of character images, called glyphs, that are not previously segmented or isolated for training. Also used is a labeled glyph position data structure that includes, for each glyph in the image source, a glyph image position in the image source associating an image location of the glyph with a character label paired with the glyph image position that indicates the character in the character set being trained. The labeled glyph position data is used to identify a collection of glyph sample image regions in the image source for each character in the character set; each glyph sample image region is large enough to contain a glyph and typically contains adjacent glyphs for other characters. The invention mathematically characterizes the template construction problem using unsegmented samples as an optimization problem that optimizes a function that represents the set of character templates being trained as an ideal image to be reconstructed to match the input image. The method produces all of the character templates contemporaneously by using a novel pixel scoring technique that implements an approximation of a maximum likelihood criterion subject to a constraint on the templates produced which holds that foreground pixels in adjacently positioned character images have substantially nonoverlapping foreground pixels. The character templates produced may be binary templates or arrays of pixel color probability values, and may also have substantially disjoint supports, such that adjacently imaged templates have substantially no overlapping foreground pixels.

    摘要翻译: 用于生成或训练一组字符模板的方法用作训练的来源,将未被分段或隔离的字符图像的图像源(称为字形)进行采样,以进行训练。 还使用了标记字形位置数据结构,其包括对于图像源中的每个字形,图像源中的字形图像位置将字形的图像位置与字符标签配对,该字符标签与指示字符的字形位置相配合 正在训练的角色集。 标记的字形位置数据用于识别字符集中每个字符的图像源中的字形样本图像区域的集合; 每个字形样本图像区域足够大以包含字形,并且通常包含用于其他字符的相邻字形。 本发明在数学上表征了使用未分段样本的模板构造问题作为优化问题,其优化表示正被训练为要重构以匹配输入图像的理想图像的文本模板集合的函数。 该方法通过使用新颖的像素评分技术同时产生所有字符模板,该新技术实现最大似然准则的近似,受制于所产生的模板的约束,其保持相邻定位的字符图像中的前景像素具有基本上不重叠的前景像素。 产生的字符模板可以是二进制模板或像素颜色概率值的阵列,并且还可以具有基本上不相交的支持,使得相邻成像的模板基本上不具有重叠的前景像素。