Character area extracting device, imaging device having character area extracting function, recording medium saving character area extracting programs, and character area extracting method
    11.
    发明授权
    Character area extracting device, imaging device having character area extracting function, recording medium saving character area extracting programs, and character area extracting method 有权
    字符区域提取装置,具有字符区域提取功能的成像装置,记录介质保存字符区域提取程序和字符区域提取方法

    公开(公告)号:US08447113B2

    公开(公告)日:2013-05-21

    申请号:US13067133

    申请日:2011-05-11

    IPC分类号: G06K9/18

    摘要: A character area extracting device includes a reflective and non-reflective area separation unit separating image data into reflective and non-reflective areas, and binarizing the image data by changing a first threshold value when it is inappropriate; a reflective area binarizing unit separating the reflective area into character and background areas, and binarizing it by changing a second threshold value when it is inappropriate; a non-reflective area binarizing unit separating the non-reflective area into the character and background areas, and binarizing it by changing a third threshold value when it is inappropriate; a reflective and non-reflective area separation evaluation unit; and a line extracting unit connecting the character areas of the reflective and non-reflective areas and extracting positional information of the connected character areas in the image data.

    摘要翻译: 字符区域提取装置包括将图像数据分离成反射和非反射区域的反射和非反射区域分离单元,并且当不合适时通过改变第一阈值来二值化图像数据; 反射区域二值化单元,将反射区域分离成字符和背景区域,并且当不适当时通过改变第二阈值来对其进行二值化; 非反射区域二值化单元,将非反射区域分离成字符和背景区域,并且当不合适时通过改变第三阈值来二值化; 反射和非反射区域分离评估单元; 以及线提取单元,连接反射区域和非反射区域的字符区域,并提取图像数据中连接的字符区域的位置信息。

    Character area extracting device, imaging device having character area extracting function, recording medium saving character area extracting programs, and character area extracting method
    12.
    发明申请
    Character area extracting device, imaging device having character area extracting function, recording medium saving character area extracting programs, and character area extracting method 有权
    字符区域提取装置,具有字符区域提取功能的成像装置,记录介质保存字符区域提取程序和字符区域提取方法

    公开(公告)号:US20110255785A1

    公开(公告)日:2011-10-20

    申请号:US13067133

    申请日:2011-05-11

    IPC分类号: G06K9/18

    摘要: A character area extracting device includes a reflective and non-reflective area separation unit separating image data into reflective and non-reflective areas, and binarizing the image data by changing a first threshold value when it is inappropriate; a reflective area binarizing unit separating the reflective area into character and background areas, and binarizing it by changing a second threshold value when it is inappropriate; a non-reflective area binarizing unit separating the non-reflective area into the character and background areas, and binarizing it by changing a third threshold value when it is inappropriate; a reflective and non-reflective area separation evaluation unit; and a line extracting unit connecting the character areas of the reflective and non-reflective areas and extracting positional information of the connected character areas in the image data.

    摘要翻译: 字符区域提取装置包括将图像数据分离成反射和非反射区域的反射和非反射区域分离单元,并且当不合适时通过改变第一阈值来二值化图像数据; 反射区域二值化单元,将反射区域分离成字符和背景区域,并且当不适当时通过改变第二阈值来对其进行二值化; 非反射区域二值化单元,将非反射区域分离成字符和背景区域,并且当不合适时通过改变第三阈值来二值化; 反射和非反射区域分离评估单元; 以及线提取单元,连接反射区域和非反射区域的字符区域,并提取图像数据中连接的字符区域的位置信息。

    Program and apparatus for forms processing
    14.
    发明申请
    Program and apparatus for forms processing 有权
    表格处理程序和设备

    公开(公告)号:US20080273802A1

    公开(公告)日:2008-11-06

    申请号:US12216632

    申请日:2008-07-08

    IPC分类号: G06K9/72

    摘要: A form processing program which is capable of automatically extracting keywords. When the image of a scanned form is entered, a layout recognizer extracts a readout region of the form image, a character recognizer recognizes characters within the readout region. A form logical definition database stores form logical definitions defining strings as keywords according to logical structures which are common to forms of same type. A possible string extractor extracts as possible strings combinations of recognized characters each of which satisfies defined relationships of a string. A linking unit links the possible strings according to positional relationships, and determines a combination of possible strings as keywords.

    摘要翻译: 能够自动提取关键字的表单处理程序。 当输入扫描形式的图像时,布局识别器提取形式图像的读出区域,字符识别器识别读出区域内的字符。 表单逻辑定义数据库存储根据与相同类型的形式相同的逻辑结构将字符串定义为关键字的逻辑定义。 可能的字符串提取器提取可识别字符串的字符串组合,每个字符串都满足字符串的已定义关系。 链接单元根据位置关系链接可能的字符串,并将可能的字符串的组合确定为关键字。

    Ruled line extracting program, ruled line extracting apparatus and ruled line extracting method
    15.
    发明申请
    Ruled line extracting program, ruled line extracting apparatus and ruled line extracting method 有权
    规则线提取程序,划线提取装置和划线提取方法

    公开(公告)号:US20080056576A1

    公开(公告)日:2008-03-06

    申请号:US11607758

    申请日:2006-11-30

    IPC分类号: G06K9/34 G06K9/46 G06K9/66

    摘要: A ruled line extracting apparatus, a ruled line extracting program and a ruled line extracting method re-extract a ruled line by changing the predetermined requirements to be met by ruled line s when a ruled line candidate extracted according to the requirements shows a low reliability. A ruled line extracting program that causes a computer to extract a ruled line in an image of a document comprises an extraction step that extracts a ruled line candidate from the image of a document according to the first requirement predefined to be met by the figures of the elements of the ruled lines, a judgment step that judges if the ruled line candidate is stable or unstable according to the structural stability of the ruled line candidate extracted in the extraction step, a requirement determination step that determines the second requirement to be met by the figures of the elements of the ruled line different from the first requirement according to the ruled line candidate judged as stable in the judgment step and the first requirement and a re-extraction step that re-extracts a ruled line candidate according to the second requirement determined in the requirement determination step.

    摘要翻译: 格线提取装置,格线提取程序和格线提取方法,当根据要求提取的格线候选显示出低可靠性时,通过改变规定线s满足的预定要求来重新提取格线。 导致计算机提取文档图像中的划线的划线提取程序包括:提取步骤,根据预定要由图像的图形所满足的第一要求从文档的图像中提取格线候选 规则线的要素,判断步骤,根据在提取步骤中提取的划线候选的结构稳定性来判断排序候选者是否稳定或不稳定;要求确定步骤,确定由第二要求满足的第二要求 根据在判定步骤和第一要求判断为稳定的判定行候选人的不同于第一要求的划线的要素的数字和根据第二要求重新提取格线候补的再提取步骤 在要求确定步骤中。

    Form processing method, form processing device, and computer product
    16.
    发明申请
    Form processing method, form processing device, and computer product 有权
    表格处理方法,表格处理设备和计算机产品

    公开(公告)号:US20080025618A1

    公开(公告)日:2008-01-31

    申请号:US11599685

    申请日:2006-11-15

    IPC分类号: G06K9/46 G06K9/72 G06K9/66

    CPC分类号: G06K9/00449

    摘要: A form processing apparatus extracts layout information and character information from a form document. A candidate extracting unit extracts word candidates from the character information. A frequency digitizing unit calculates emission probability of a word candidate from each element. A relation digitizing unit calculates transition probability that relationship between word candidates is established. An evaluating unit calculates an evaluation value indicative of a probability of appearance of word candidates in respective logical elements. A determining unit determines the element and a word candidate thereof as the element and a character string thereof in the form document, based on the evaluation value.

    摘要翻译: 表单处理装置从表单文档中提取布局信息和字符信息。 候选提取单元从字符信息中提取词候选。 频率数字化单元从每个元素计算单词候选的发射概率。 关系数字化单元计算建立词候选之间的关系的转移概率。 评估单元计算表示各逻辑元素中的词候选出现概率的评价值。 确定单元基于评估值,将元素及其候选词确定为表单文档中的元素和字符串。

    Storage medium, apparatus and method for recognizing characters in a document image using document recognition
    17.
    发明授权
    Storage medium, apparatus and method for recognizing characters in a document image using document recognition 有权
    使用文件识别识别文档图像中的字符的存储介质,装置和方法

    公开(公告)号:US08515175B2

    公开(公告)日:2013-08-20

    申请号:US12392798

    申请日:2009-02-25

    IPC分类号: G06K9/18

    CPC分类号: G06K9/00463

    摘要: A program causes a computer to function as a document recognition apparatus, having an extraction unit for extracting connected components of pixels from an input image, a generation unit for generating a reference element that is connected components of pixels extracted by the extraction unit and combined elements obtained by combining the reference element and connected components of pixels adjacent to the reference element as an element to be estimated, a calculation unit for calculating a degree of certainty that indicates how much the element to be estimated generated by the generation unit seems to be a character, and a determination unit for identifying elements that seem to be characters among the elements to be estimated based on the degree of certainty calculated by the calculation unit.

    摘要翻译: 一种程序使计算机作为文件识别装置起作用,具有用于从输入图像中提取像素的连接分量的提取单元,生成单元,用于生成由提取单元提取的像素的连接分量和组合元素 通过组合参考元素和与参考元素相邻的像素的连接分量作为要估计的元素获得的计算单元,用于计算确定性程度的计算单元,其表示由生成单元生成的要估计的元素多少是 字符和确定单元,用于基于由计算单元计算出的确定性程度来识别要估计的要素中的字符的元素。

    Area extraction program, character recognition program, and character recognition device
    18.
    发明授权
    Area extraction program, character recognition program, and character recognition device 有权
    区域提取程序,字符识别程序和字符识别装置

    公开(公告)号:US08300942B2

    公开(公告)日:2012-10-30

    申请号:US12366004

    申请日:2009-02-05

    IPC分类号: G06K9/46 G06K9/72

    摘要: An area extraction method including obtaining a character lattice showing a connection relation between unit areas, which are obtained by separating a character string pattern in an image into patterns each recognized as corresponding to a single character, judging whether or not all combinations of each of the unit areas in the obtained character lattice and each of the unit areas in a regular lattice defining a regular connection relation between the unit areas are likely to be established, generating a path coupling between nodes corresponding to the combination of the unit areas which is determined as likely to be established, determining an optimum path from the generated paths based on a degree of coincidence with the regular lattice or the character lattice, and extracting from an image the unit areas in the character lattice corresponding to the determined optimum path.

    摘要翻译: 一种区域提取方法,其包括获得通过将图像中的字符串图案分离成各自识别为与单个字符相对应的图案而获得的单元区域之间的连接关系的字符格子,判断是否将每个 获得的字符格中的单位区域和规定单位区域之间的规则连接关系的规则格子中的每个单位区域很可能被建立,生成对应于单位区域的组合的节点之间的路径耦合,单元区域被确定为 可能建立起来,基于与规则格子或字符格子的一致程度从所生成的路径确定最佳路径,以及从图像中提取与所确定的最佳路径对应的字符格点中的单位区域。

    RECORDING MEDIUM FOR RECORDING LOGICAL STRUCTURE MODEL CREATION ASSISTANCE PROGRAM, LOGICAL STRUCTURE MODEL CREATION ASSISTANCE DEVICE AND LOGICAL STRUCTURE MODEL CREATION ASSISTANCE METHOD
    19.
    发明申请
    RECORDING MEDIUM FOR RECORDING LOGICAL STRUCTURE MODEL CREATION ASSISTANCE PROGRAM, LOGICAL STRUCTURE MODEL CREATION ASSISTANCE DEVICE AND LOGICAL STRUCTURE MODEL CREATION ASSISTANCE METHOD 有权
    用于记录逻辑结构模型创建辅助程序,逻辑结构模型创建辅助装置和逻辑结构模型创建辅助方法的记录介质

    公开(公告)号:US20090148049A1

    公开(公告)日:2009-06-11

    申请号:US12328442

    申请日:2008-12-04

    IPC分类号: G06K9/46

    CPC分类号: G06F17/243

    摘要: A method for assisting in the creation of a logical structure model, which stores, from an image in which character strings associated respectively with a plurality of logical elements constituting a logical structure are described, the logical elements, character strings associated with the logical elements, and the logical structure, wherein character strings in an input image and the logical structure among the character strings in the input image are extracted, a logical element is selected among the plurality of logical elements according to the degrees of similarity between the extracted character strings and the character string associated respectively with the plurality of logical elements stored in the logical structure model, a character string associated with the selected logical element and a character string in the input image associated with the logical element based on the logical structure among the extracted character strings in the input image are extracted.

    摘要翻译: 一种辅助创建逻辑结构模型的方法,该逻辑结构模型存储从其中描述了分别与构成逻辑结构的多个逻辑元件相关联的字符串的图像,逻辑元素,与逻辑元素相关联的字符串, 以及逻辑结构,其中输入图像中的字符串和输入图像中的字符串之间的逻辑结构被提取,根据提取的字符串之间的相似度和多个逻辑元素之间的相似度来选择逻辑元素, 分别与存储在逻辑结构模型中的多个逻辑元素相关联的字符串,与所选择的逻辑元素相关联的字符串和基于提取的字符串中的逻辑结构与逻辑元素相关联的输入图像中的字符串 在输入图像中提取。