Character area extracting device, imaging device having character area extracting function, recording medium saving character area extracting programs, and character area extracting method
    11.
    发明申请
    Character area extracting device, imaging device having character area extracting function, recording medium saving character area extracting programs, and character area extracting method 有权
    字符区域提取装置,具有字符区域提取功能的成像装置,记录介质保存字符区域提取程序和字符区域提取方法

    公开(公告)号:US20110255785A1

    公开(公告)日:2011-10-20

    申请号:US13067133

    申请日:2011-05-11

    IPC分类号: G06K9/18

    摘要: A character area extracting device includes a reflective and non-reflective area separation unit separating image data into reflective and non-reflective areas, and binarizing the image data by changing a first threshold value when it is inappropriate; a reflective area binarizing unit separating the reflective area into character and background areas, and binarizing it by changing a second threshold value when it is inappropriate; a non-reflective area binarizing unit separating the non-reflective area into the character and background areas, and binarizing it by changing a third threshold value when it is inappropriate; a reflective and non-reflective area separation evaluation unit; and a line extracting unit connecting the character areas of the reflective and non-reflective areas and extracting positional information of the connected character areas in the image data.

    摘要翻译: 字符区域提取装置包括将图像数据分离成反射和非反射区域的反射和非反射区域分离单元,并且当不合适时通过改变第一阈值来二值化图像数据; 反射区域二值化单元,将反射区域分离成字符和背景区域,并且当不适当时通过改变第二阈值来对其进行二值化; 非反射区域二值化单元,将非反射区域分离成字符和背景区域,并且当不合适时通过改变第三阈值来二值化; 反射和非反射区域分离评估单元; 以及线提取单元,连接反射区域和非反射区域的字符区域,并提取图像数据中连接的字符区域的位置信息。

    Method and apparatus for recognizing boundary line in an image information
    12.
    发明申请
    Method and apparatus for recognizing boundary line in an image information 有权
    用于识别图像信息中的边界线的方法和装置

    公开(公告)号:US20080199082A1

    公开(公告)日:2008-08-21

    申请号:US12071050

    申请日:2008-02-14

    IPC分类号: G06K9/48

    摘要: According to an aspect of an embodiment, a method of detecting boundary line information contained in image information comprising a plurality of pixels in either one of first and second states, comprising: detecting a first group of pixels in the first state disposed continuously in said image information to determine first line information and detecting a second group of pixels in the first state disposed adjacently with each other and surrounded by pixels in the second state to determine edge information based on the contour of the second group of pixels; and determining the boundary line information on the basis of the information of the relation of relative position of the line information and the edge information and the size of the first and second group of pixels.

    摘要翻译: 根据实施例的一个方面,一种检测包含在包括第一和第二状态中的任一个中的多个像素的图像信息中的边界线信息的方法,包括:检测连续设置在所述图像中的第一状态的第一组像素 信息,用于确定第一行信息并检测第一状态的第二组像素,彼此相邻并且由第二状态的像素包围,以基于第二组像素的轮廓确定边缘信息; 以及基于线信息的相对位置与边缘信息的关系的信息以及第一和第二像素组的大小来确定边界线信息。

    Method and apparatus for recognizing boundary line in an image information
    13.
    发明授权
    Method and apparatus for recognizing boundary line in an image information 有权
    用于识别图像信息中的边界线的方法和装置

    公开(公告)号:US08582888B2

    公开(公告)日:2013-11-12

    申请号:US12071050

    申请日:2008-02-14

    IPC分类号: G06K9/18

    摘要: According to an aspect of an embodiment, a method of detecting boundary line information contained in image information comprising a plurality of pixels in either one of first and second states, comprising: detecting a first group of pixels in the first state disposed continuously in said image information to determine first line information and detecting a second group of pixels in the first state disposed adjacently with each other and surrounded by pixels in the second state to determine edge information based on the contour of the second group of pixels; and determining the boundary line information on the basis of the information of the relation of relative position of the line information and the edge information and the size of the first and second group of pixels.

    摘要翻译: 根据实施例的一个方面,一种检测包含在包括第一和第二状态中的任一个中的多个像素的图像信息中的边界线信息的方法,包括:检测连续设置在所述图像中的第一状态的第一组像素 信息,用于确定第一行信息并检测第一状态的第二组像素,彼此相邻并且由第二状态的像素包围,以基于第二组像素的轮廓确定边缘信息; 以及基于线信息的相对位置与边缘信息的关系的信息以及第一和第二像素组的大小来确定边界线信息。

    Form processing method, form processing device, and computer product
    14.
    发明申请
    Form processing method, form processing device, and computer product 有权
    表格处理方法,表格处理设备和计算机产品

    公开(公告)号:US20080025618A1

    公开(公告)日:2008-01-31

    申请号:US11599685

    申请日:2006-11-15

    IPC分类号: G06K9/46 G06K9/72 G06K9/66

    CPC分类号: G06K9/00449

    摘要: A form processing apparatus extracts layout information and character information from a form document. A candidate extracting unit extracts word candidates from the character information. A frequency digitizing unit calculates emission probability of a word candidate from each element. A relation digitizing unit calculates transition probability that relationship between word candidates is established. An evaluating unit calculates an evaluation value indicative of a probability of appearance of word candidates in respective logical elements. A determining unit determines the element and a word candidate thereof as the element and a character string thereof in the form document, based on the evaluation value.

    摘要翻译: 表单处理装置从表单文档中提取布局信息和字符信息。 候选提取单元从字符信息中提取词候选。 频率数字化单元从每个元素计算单词候选的发射概率。 关系数字化单元计算建立词候选之间的关系的转移概率。 评估单元计算表示各逻辑元素中的词候选出现概率的评价值。 确定单元基于评估值,将元素及其候选词确定为表单文档中的元素和字符串。

    Document type identifying method and document type identifying apparatus
    15.
    发明授权
    Document type identifying method and document type identifying apparatus 有权
    文件类型识别方法和文件类型识别装置

    公开(公告)号:US08275792B2

    公开(公告)日:2012-09-25

    申请号:US12585155

    申请日:2009-09-04

    IPC分类号: G06F17/30

    CPC分类号: G06K9/2054 G06K2209/01

    摘要: A document type identifying apparatus includes in advance a database storing therein keywords used as keys that identify document types in association with each document type. The document type identifying apparatus aligns word strings written on a document and generates partial keyword strings for each keyword by using the keywords stored in the database. The partial keyword strings are to be checked for matching with the word strings written on the document. Then, the document type identifying apparatus checks matching of the grouped and aligned word strings with the partial keyword strings and obtains, for each keyword, each number of matched words with the highest matching rates between the grouped word strings that are successfully matched and the partial keyword strings. Then, each number of matched words is used to calculate each evaluation value to determine the document type.

    摘要翻译: 文档类型识别装置预先包括在其中存储关键字的数据库,所述关键字用作与每个文档类型相关联的用于标识文档类型的键。 文档类型识别装置对准写在文档上的字串,并通过使用存储在数据库中的关键字为每个关键字生成部分关键字串。 要检查部分关键字字符串以匹配写在文档上的字串。 然后,文档类型识别装置检查分组和排列的字串与部分关键字串的匹配,并且为每个关键字获得在成功匹配的分组字串之间​​具有最高匹配速率的每个匹配字数, 关键字字符串。 然后,使用每个匹配字数来计算每个评估值以确定文档类型。

    STORAGE MEDIUM STORING DOCUMENT RECOGNITION PROGRAM, DOCUMENT RECOGNITION APPARATUS AND METHOD THEREOF
    16.
    发明申请
    STORAGE MEDIUM STORING DOCUMENT RECOGNITION PROGRAM, DOCUMENT RECOGNITION APPARATUS AND METHOD THEREOF 有权
    存储媒体存储文件识别程序,文档识别装置及其方法

    公开(公告)号:US20090226089A1

    公开(公告)日:2009-09-10

    申请号:US12392798

    申请日:2009-02-25

    IPC分类号: G06K9/18

    CPC分类号: G06K9/00463

    摘要: A program causes a computer to function as a document recognition apparatus, having an extraction unit for extracting connected components of pixels from an input image, a generation unit for generating a reference element that is connected components of pixels extracted by the extraction unit and combined elements obtained by combining the reference element and connected components of pixels adjacent to the reference element as an element to be estimated, a calculation unit for calculating a degree of certainty that indicates how much the element to be estimated generated by the generation unit seems to be a character, and a determination unit for identifying elements that seem to be characters among the elements to be estimated based on the degree of certainty calculated by the calculation unit.

    摘要翻译: 一种程序使计算机作为文件识别装置起作用,具有用于从输入图像中提取像素的连接分量的提取单元,生成单元,用于生成由提取单元提取的像素的连接分量和组合元素 通过组合参考元素和与参考元素相邻的像素的连接分量作为要估计的元素获得的计算单元,用于计算确定性程度的计算单元,其表示由生成单元生成的要估计的元素多少是 字符和确定单元,用于基于由计算单元计算出的确定性程度来识别要估计的要素中的字符的元素。

    Form processing method, form processing device, and computer product
    17.
    发明授权
    Form processing method, form processing device, and computer product 有权
    表格处理方法,表格处理设备和计算机产品

    公开(公告)号:US07792369B2

    公开(公告)日:2010-09-07

    申请号:US11599685

    申请日:2006-11-15

    IPC分类号: G06K9/72

    CPC分类号: G06K9/00449

    摘要: A form processing apparatus extracts layout information and character information from a form document. A candidate extracting unit extracts word candidates from the character information. A frequency digitizing unit calculates emission probability of a word candidate from each element. A relation digitizing unit calculates transition probability that relationship between word candidates is established. An evaluating unit calculates an evaluation value indicative of a probability of appearance of word candidates in respective logical elements. A determining unit determines the element and a word candidate thereof as the element and a character string thereof in the form document, based on the evaluation value.

    摘要翻译: 表单处理装置从表单文档中提取布局信息和字符信息。 候选提取单元从字符信息中提取词候选。 频率数字化单元从每个元素计算单词候选的发射概率。 关系数字化单元计算建立词候选之间的关系的转移概率。 评估单元计算表示各逻辑元素中的词候选出现概率的评价值。 确定单元基于评估值,将元素及其候选词确定为表单文档中的元素和字符串。

    Apparatus and method of analyzing layout of document, and computer product
    18.
    发明授权
    Apparatus and method of analyzing layout of document, and computer product 失效
    分析文件布局和计算机产品的装置和方法

    公开(公告)号:US07257253B2

    公开(公告)日:2007-08-14

    申请号:US10350180

    申请日:2003-01-24

    IPC分类号: G06K9/34

    CPC分类号: G06K9/00463

    摘要: In an apparatus for analyzing a layout of a document, a character candidate element generator generates character candidate elements from black pixel linkage components of a document image. A horizontally oriented line rectangle generator sets a plurality of character candidate elements as a line candidate rectangle, among character candidate elements aligned in horizontal line orientation, when each amount of displacement of the set character candidate elements in a vertical orientation with respect to the horizontal line orientation, is smaller than or equal to a threshold value. A horizontally oriented paragraph-box generator sets a plurality of line candidate elements having approximately the same length as each other in the vertical orientation, as a paragraph candidate element.

    摘要翻译: 在用于分析文档的布局的装置中,字符候选元素生成器从文档图像的黑色像素连接分量生成角色候选元素。 当水平方向的线矩形发生器在垂直方向上相对于水平线的每个位移量时,将多个字符候选元素设置为在水平行方向对齐的字符候选元素中的行候选矩形 取向小于或等于阈值。 水平定向的段落框生成器将在垂直方向上彼此具有大致相同长度的多个行候选元素设置为段落候选元素。

    Storage medium, apparatus and method for recognizing characters in a document image using document recognition
    19.
    发明授权
    Storage medium, apparatus and method for recognizing characters in a document image using document recognition 有权
    使用文件识别识别文档图像中的字符的存储介质,装置和方法

    公开(公告)号:US08515175B2

    公开(公告)日:2013-08-20

    申请号:US12392798

    申请日:2009-02-25

    IPC分类号: G06K9/18

    CPC分类号: G06K9/00463

    摘要: A program causes a computer to function as a document recognition apparatus, having an extraction unit for extracting connected components of pixels from an input image, a generation unit for generating a reference element that is connected components of pixels extracted by the extraction unit and combined elements obtained by combining the reference element and connected components of pixels adjacent to the reference element as an element to be estimated, a calculation unit for calculating a degree of certainty that indicates how much the element to be estimated generated by the generation unit seems to be a character, and a determination unit for identifying elements that seem to be characters among the elements to be estimated based on the degree of certainty calculated by the calculation unit.

    摘要翻译: 一种程序使计算机作为文件识别装置起作用,具有用于从输入图像中提取像素的连接分量的提取单元,生成单元,用于生成由提取单元提取的像素的连接分量和组合元素 通过组合参考元素和与参考元素相邻的像素的连接分量作为要估计的元素获得的计算单元,用于计算确定性程度的计算单元,其表示由生成单元生成的要估计的元素多少是 字符和确定单元,用于基于由计算单元计算出的确定性程度来识别要估计的要素中的字符的元素。

    Program and apparatus for forms processing
    20.
    发明申请
    Program and apparatus for forms processing 有权
    表格处理程序和设备

    公开(公告)号:US20080273802A1

    公开(公告)日:2008-11-06

    申请号:US12216632

    申请日:2008-07-08

    IPC分类号: G06K9/72

    摘要: A form processing program which is capable of automatically extracting keywords. When the image of a scanned form is entered, a layout recognizer extracts a readout region of the form image, a character recognizer recognizes characters within the readout region. A form logical definition database stores form logical definitions defining strings as keywords according to logical structures which are common to forms of same type. A possible string extractor extracts as possible strings combinations of recognized characters each of which satisfies defined relationships of a string. A linking unit links the possible strings according to positional relationships, and determines a combination of possible strings as keywords.

    摘要翻译: 能够自动提取关键字的表单处理程序。 当输入扫描形式的图像时,布局识别器提取形式图像的读出区域,字符识别器识别读出区域内的字符。 表单逻辑定义数据库存储根据与相同类型的形式相同的逻辑结构将字符串定义为关键字的逻辑定义。 可能的字符串提取器提取可识别字符串的字符串组合,每个字符串都满足字符串的已定义关系。 链接单元根据位置关系链接可能的字符串,并将可能的字符串的组合确定为关键字。