Character area extracting device, imaging device having character area extracting function, recording medium saving character area extracting programs, and character area extracting method
    1.
    发明授权
    Character area extracting device, imaging device having character area extracting function, recording medium saving character area extracting programs, and character area extracting method 有权
    字符区域提取装置,具有字符区域提取功能的成像装置,记录介质保存字符区域提取程序和字符区域提取方法

    公开(公告)号:US08447113B2

    公开(公告)日:2013-05-21

    申请号:US13067133

    申请日:2011-05-11

    IPC分类号: G06K9/18

    摘要: A character area extracting device includes a reflective and non-reflective area separation unit separating image data into reflective and non-reflective areas, and binarizing the image data by changing a first threshold value when it is inappropriate; a reflective area binarizing unit separating the reflective area into character and background areas, and binarizing it by changing a second threshold value when it is inappropriate; a non-reflective area binarizing unit separating the non-reflective area into the character and background areas, and binarizing it by changing a third threshold value when it is inappropriate; a reflective and non-reflective area separation evaluation unit; and a line extracting unit connecting the character areas of the reflective and non-reflective areas and extracting positional information of the connected character areas in the image data.

    摘要翻译: 字符区域提取装置包括将图像数据分离成反射和非反射区域的反射和非反射区域分离单元,并且当不合适时通过改变第一阈值来二值化图像数据; 反射区域二值化单元,将反射区域分离成字符和背景区域,并且当不适当时通过改变第二阈值来对其进行二值化; 非反射区域二值化单元,将非反射区域分离成字符和背景区域,并且当不合适时通过改变第三阈值来二值化; 反射和非反射区域分离评估单元; 以及线提取单元,连接反射区域和非反射区域的字符区域,并提取图像数据中连接的字符区域的位置信息。

    Character area extracting device, imaging device having character area extracting function, recording medium saving character area extracting programs, and character area extracting method
    2.
    发明申请
    Character area extracting device, imaging device having character area extracting function, recording medium saving character area extracting programs, and character area extracting method 有权
    字符区域提取装置,具有字符区域提取功能的成像装置,记录介质保存字符区域提取程序和字符区域提取方法

    公开(公告)号:US20110255785A1

    公开(公告)日:2011-10-20

    申请号:US13067133

    申请日:2011-05-11

    IPC分类号: G06K9/18

    摘要: A character area extracting device includes a reflective and non-reflective area separation unit separating image data into reflective and non-reflective areas, and binarizing the image data by changing a first threshold value when it is inappropriate; a reflective area binarizing unit separating the reflective area into character and background areas, and binarizing it by changing a second threshold value when it is inappropriate; a non-reflective area binarizing unit separating the non-reflective area into the character and background areas, and binarizing it by changing a third threshold value when it is inappropriate; a reflective and non-reflective area separation evaluation unit; and a line extracting unit connecting the character areas of the reflective and non-reflective areas and extracting positional information of the connected character areas in the image data.

    摘要翻译: 字符区域提取装置包括将图像数据分离成反射和非反射区域的反射和非反射区域分离单元,并且当不合适时通过改变第一阈值来二值化图像数据; 反射区域二值化单元,将反射区域分离成字符和背景区域,并且当不适当时通过改变第二阈值来对其进行二值化; 非反射区域二值化单元,将非反射区域分离成字符和背景区域,并且当不合适时通过改变第三阈值来二值化; 反射和非反射区域分离评估单元; 以及线提取单元,连接反射区域和非反射区域的字符区域,并提取图像数据中连接的字符区域的位置信息。

    Form processing method, form processing device, and computer product
    4.
    发明申请
    Form processing method, form processing device, and computer product 有权
    表格处理方法,表格处理设备和计算机产品

    公开(公告)号:US20080025618A1

    公开(公告)日:2008-01-31

    申请号:US11599685

    申请日:2006-11-15

    IPC分类号: G06K9/46 G06K9/72 G06K9/66

    CPC分类号: G06K9/00449

    摘要: A form processing apparatus extracts layout information and character information from a form document. A candidate extracting unit extracts word candidates from the character information. A frequency digitizing unit calculates emission probability of a word candidate from each element. A relation digitizing unit calculates transition probability that relationship between word candidates is established. An evaluating unit calculates an evaluation value indicative of a probability of appearance of word candidates in respective logical elements. A determining unit determines the element and a word candidate thereof as the element and a character string thereof in the form document, based on the evaluation value.

    摘要翻译: 表单处理装置从表单文档中提取布局信息和字符信息。 候选提取单元从字符信息中提取词候选。 频率数字化单元从每个元素计算单词候选的发射概率。 关系数字化单元计算建立词候选之间的关系的转移概率。 评估单元计算表示各逻辑元素中的词候选出现概率的评价值。 确定单元基于评估值,将元素及其候选词确定为表单文档中的元素和字符串。

    RECORDING MEDIUM FOR RECORDING LOGICAL STRUCTURE MODEL CREATION ASSISTANCE PROGRAM, LOGICAL STRUCTURE MODEL CREATION ASSISTANCE DEVICE AND LOGICAL STRUCTURE MODEL CREATION ASSISTANCE METHOD
    5.
    发明申请
    RECORDING MEDIUM FOR RECORDING LOGICAL STRUCTURE MODEL CREATION ASSISTANCE PROGRAM, LOGICAL STRUCTURE MODEL CREATION ASSISTANCE DEVICE AND LOGICAL STRUCTURE MODEL CREATION ASSISTANCE METHOD 有权
    用于记录逻辑结构模型创建辅助程序,逻辑结构模型创建辅助装置和逻辑结构模型创建辅助方法的记录介质

    公开(公告)号:US20090148049A1

    公开(公告)日:2009-06-11

    申请号:US12328442

    申请日:2008-12-04

    IPC分类号: G06K9/46

    CPC分类号: G06F17/243

    摘要: A method for assisting in the creation of a logical structure model, which stores, from an image in which character strings associated respectively with a plurality of logical elements constituting a logical structure are described, the logical elements, character strings associated with the logical elements, and the logical structure, wherein character strings in an input image and the logical structure among the character strings in the input image are extracted, a logical element is selected among the plurality of logical elements according to the degrees of similarity between the extracted character strings and the character string associated respectively with the plurality of logical elements stored in the logical structure model, a character string associated with the selected logical element and a character string in the input image associated with the logical element based on the logical structure among the extracted character strings in the input image are extracted.

    摘要翻译: 一种辅助创建逻辑结构模型的方法,该逻辑结构模型存储从其中描述了分别与构成逻辑结构的多个逻辑元件相关联的字符串的图像,逻辑元素,与逻辑元素相关联的字符串, 以及逻辑结构,其中输入图像中的字符串和输入图像中的字符串之间的逻辑结构被提取,根据提取的字符串之间的相似度和多个逻辑元素之间的相似度来选择逻辑元素, 分别与存储在逻辑结构模型中的多个逻辑元素相关联的字符串,与所选择的逻辑元素相关联的字符串和基于提取的字符串中的逻辑结构与逻辑元素相关联的输入图像中的字符串 在输入图像中提取。

    Method and apparatus for recognizing boundary line in an image information
    7.
    发明申请
    Method and apparatus for recognizing boundary line in an image information 有权
    用于识别图像信息中的边界线的方法和装置

    公开(公告)号:US20080199082A1

    公开(公告)日:2008-08-21

    申请号:US12071050

    申请日:2008-02-14

    IPC分类号: G06K9/48

    摘要: According to an aspect of an embodiment, a method of detecting boundary line information contained in image information comprising a plurality of pixels in either one of first and second states, comprising: detecting a first group of pixels in the first state disposed continuously in said image information to determine first line information and detecting a second group of pixels in the first state disposed adjacently with each other and surrounded by pixels in the second state to determine edge information based on the contour of the second group of pixels; and determining the boundary line information on the basis of the information of the relation of relative position of the line information and the edge information and the size of the first and second group of pixels.

    摘要翻译: 根据实施例的一个方面,一种检测包含在包括第一和第二状态中的任一个中的多个像素的图像信息中的边界线信息的方法,包括:检测连续设置在所述图像中的第一状态的第一组像素 信息,用于确定第一行信息并检测第一状态的第二组像素,彼此相邻并且由第二状态的像素包围,以基于第二组像素的轮廓确定边缘信息; 以及基于线信息的相对位置与边缘信息的关系的信息以及第一和第二像素组的大小来确定边界线信息。

    Document type identifying method and document type identifying apparatus
    8.
    发明授权
    Document type identifying method and document type identifying apparatus 有权
    文件类型识别方法和文件类型识别装置

    公开(公告)号:US08275792B2

    公开(公告)日:2012-09-25

    申请号:US12585155

    申请日:2009-09-04

    IPC分类号: G06F17/30

    CPC分类号: G06K9/2054 G06K2209/01

    摘要: A document type identifying apparatus includes in advance a database storing therein keywords used as keys that identify document types in association with each document type. The document type identifying apparatus aligns word strings written on a document and generates partial keyword strings for each keyword by using the keywords stored in the database. The partial keyword strings are to be checked for matching with the word strings written on the document. Then, the document type identifying apparatus checks matching of the grouped and aligned word strings with the partial keyword strings and obtains, for each keyword, each number of matched words with the highest matching rates between the grouped word strings that are successfully matched and the partial keyword strings. Then, each number of matched words is used to calculate each evaluation value to determine the document type.

    摘要翻译: 文档类型识别装置预先包括在其中存储关键字的数据库,所述关键字用作与每个文档类型相关联的用于标识文档类型的键。 文档类型识别装置对准写在文档上的字串,并通过使用存储在数据库中的关键字为每个关键字生成部分关键字串。 要检查部分关键字字符串以匹配写在文档上的字串。 然后,文档类型识别装置检查分组和排列的字串与部分关键字串的匹配,并且为每个关键字获得在成功匹配的分组字串之间​​具有最高匹配速率的每个匹配字数, 关键字字符串。 然后,使用每个匹配字数来计算每个评估值以确定文档类型。

    Method and apparatus for recognizing boundary line in an image information
    10.
    发明授权
    Method and apparatus for recognizing boundary line in an image information 有权
    用于识别图像信息中的边界线的方法和装置

    公开(公告)号:US08582888B2

    公开(公告)日:2013-11-12

    申请号:US12071050

    申请日:2008-02-14

    IPC分类号: G06K9/18

    摘要: According to an aspect of an embodiment, a method of detecting boundary line information contained in image information comprising a plurality of pixels in either one of first and second states, comprising: detecting a first group of pixels in the first state disposed continuously in said image information to determine first line information and detecting a second group of pixels in the first state disposed adjacently with each other and surrounded by pixels in the second state to determine edge information based on the contour of the second group of pixels; and determining the boundary line information on the basis of the information of the relation of relative position of the line information and the edge information and the size of the first and second group of pixels.

    摘要翻译: 根据实施例的一个方面,一种检测包含在包括第一和第二状态中的任一个中的多个像素的图像信息中的边界线信息的方法,包括:检测连续设置在所述图像中的第一状态的第一组像素 信息,用于确定第一行信息并检测第一状态的第二组像素,彼此相邻并且由第二状态的像素包围,以基于第二组像素的轮廓确定边缘信息; 以及基于线信息的相对位置与边缘信息的关系的信息以及第一和第二像素组的大小来确定边界线信息。