Recording medium for recording logical structure model creation assistance program, logical structure model creation assistance device and logical structure model creation assistance method
    32.
    发明授权
    Recording medium for recording logical structure model creation assistance program, logical structure model creation assistance device and logical structure model creation assistance method 有权
    用于记录逻辑结构模型创建辅助程序,逻辑结构模型创建辅助装置和逻辑结构模型创建辅助方法的记录介质

    公开(公告)号:US08249351B2

    公开(公告)日:2012-08-21

    申请号:US12328442

    申请日:2008-12-04

    IPC分类号: G06K9/00 G06F7/00 G06F17/00

    CPC分类号: G06F17/243

    摘要: A method for assisting in the creation of a logical structure model, which stores, from an image in which character strings associated respectively with a plurality of logical elements constituting a logical structure are described, the logical elements, character strings associated with the logical elements, and the logical structure, wherein character strings in an input image and the logical structure among the character strings in the input image are extracted, a logical element is selected among the plurality of logical elements according to the degrees of similarity between the extracted character strings and the character string associated respectively with the plurality of logical elements stored in the logical structure model, a character string associated with the selected logical element and a character string in the input image associated with the logical element based on the logical structure among the extracted character strings in the input image are extracted.

    摘要翻译: 一种辅助创建逻辑结构模型的方法,该逻辑结构模型存储从其中描述了分别与构成逻辑结构的多个逻辑元件相关联的字符串的图像,逻辑元素,与逻辑元素相关联的字符串, 以及逻辑结构,其中输入图像中的字符串和输入图像中的字符串之间的逻辑结构被提取,根据提取的字符串之间的相似度和多个逻辑元素之间的相似度来选择逻辑元素, 分别与存储在逻辑结构模型中的多个逻辑元素相关联的字符串,与所选择的逻辑元素相关联的字符串和基于提取的字符串中的逻辑结构与逻辑元素相关联的输入图像中的字符串 在输入图像中提取。

    Program and apparatus for forms processing
    33.
    发明授权
    Program and apparatus for forms processing 有权
    表格处理程序和设备

    公开(公告)号:US08131087B2

    公开(公告)日:2012-03-06

    申请号:US12216632

    申请日:2008-07-08

    摘要: A form processing program which is capable of automatically extracting keywords. When the image of a scanned form is entered, a layout recognizer extracts a readout region of the form image, a character recognizer recognizes characters within the readout region. A form logical definition database stores form logical definitions defining strings as keywords according to logical structures which are common to forms of same type. A possible string extractor extracts as possible strings combinations of recognized characters each of which satisfies defined relationships of a string. A linking unit links the possible strings according to positional relationships, and determines a combination of possible strings as keywords.

    摘要翻译: 能够自动提取关键字的表单处理程序。 当输入扫描形式的图像时,布局识别器提取形式图像的读出区域,字符识别器识别读出区域内的字符。 表单逻辑定义数据库存储根据与相同类型的形式相同的逻辑结构将字符串定义为关键字的逻辑定义。 可能的字符串提取器提取可识别字符串的字符串组合,每个字符串都满足字符串的已定义关系。 链接单元根据位置关系链接可能的字符串,并将可能的字符串的组合确定为关键字。

    Form processing method, form processing device, and computer product
    34.
    发明授权
    Form processing method, form processing device, and computer product 有权
    表格处理方法,表格处理设备和计算机产品

    公开(公告)号:US07792369B2

    公开(公告)日:2010-09-07

    申请号:US11599685

    申请日:2006-11-15

    IPC分类号: G06K9/72

    CPC分类号: G06K9/00449

    摘要: A form processing apparatus extracts layout information and character information from a form document. A candidate extracting unit extracts word candidates from the character information. A frequency digitizing unit calculates emission probability of a word candidate from each element. A relation digitizing unit calculates transition probability that relationship between word candidates is established. An evaluating unit calculates an evaluation value indicative of a probability of appearance of word candidates in respective logical elements. A determining unit determines the element and a word candidate thereof as the element and a character string thereof in the form document, based on the evaluation value.

    摘要翻译: 表单处理装置从表单文档中提取布局信息和字符信息。 候选提取单元从字符信息中提取词候选。 频率数字化单元从每个元素计算单词候选的发射概率。 关系数字化单元计算建立词候选之间的关系的转移概率。 评估单元计算表示各逻辑元素中的词候选出现概率的评价值。 确定单元基于评估值,将元素及其候选词确定为表单文档中的元素和字符串。

    Document type identifying method and document type identifying apparatus
    35.
    发明申请
    Document type identifying method and document type identifying apparatus 有权
    文件类型识别方法和文件类型识别装置

    公开(公告)号:US20100005096A1

    公开(公告)日:2010-01-07

    申请号:US12585155

    申请日:2009-09-04

    IPC分类号: G06F17/30

    CPC分类号: G06K9/2054 G06K2209/01

    摘要: A document type identifying apparatus includes in advance a database storing therein keywords used as keys that identify document types in association with each document type. The document type identifying apparatus aligns word strings written on a document and generates partial keyword strings for each keyword by using the keywords stored in the database. The partial keyword strings are to be checked for matching with the word strings written on the document. Then, the document type identifying apparatus checks matching of the grouped and aligned word strings with the partial keyword strings and obtains, for each keyword, each number of matched words with the highest matching rates between the grouped word strings that are successfully matched and the partial keyword strings. Then, each number of matched words is used to calculate each evaluation value to determine the document type.

    摘要翻译: 文档类型识别装置预先包括在其中存储关键字的数据库,所述关键字用作与每个文档类型相关联的用于标识文档类型的键。 文档类型识别装置对准写在文档上的字串,并通过使用存储在数据库中的关键字为每个关键字生成部分关键字串。 要检查部分关键字字符串以匹配写在文档上的字串。 然后,文档类型识别装置检查分组和排列的字串与部分关键字串的匹配,并且为每个关键字获得在成功匹配的分组字串之间​​具有最高匹配速率的每个匹配字数, 关键字字符串。 然后,使用每个匹配字数来计算每个评估值以确定文档类型。

    IMAGE RECOGNITION APPARATUS, IMAGE RECOGNITION METHOD, AND STORAGE MEDIUM RECORDING IMAGE RECOGNITION PROGRAM
    37.
    发明申请
    IMAGE RECOGNITION APPARATUS, IMAGE RECOGNITION METHOD, AND STORAGE MEDIUM RECORDING IMAGE RECOGNITION PROGRAM 有权
    图像识别装置,图像识别方法和存储媒体记录图像识别程序

    公开(公告)号:US20090110282A1

    公开(公告)日:2009-04-30

    申请号:US12250302

    申请日:2008-10-13

    IPC分类号: G06K9/00

    摘要: An image recognition apparatus recognizes the correspondence between character strings and logical elements composing a logical structure in an image in which the character strings are described as the logical elements to recognize each logical element. The image recognition apparatus includes outputting means for outputting the recognized logical elements when the correspondence is recognized or re-recognized; first determining means for determining a certain logical element to be correct when input of a determination request to determine the logical element is received from a user; second determining means for determining the correctness of all the logical elements output before the logical element determined by the first determining means and is positioned according to confirmation by the user; and re-recognizing means for re-recognizing the correspondence between logical elements that have not been determined to be correct and the character strings on the basis of the determination content for each logical element.

    摘要翻译: 图像识别装置识别字符串和组成逻辑结构的逻辑元件之间的对应关系,其中描述了字符串作为识别每个逻辑元件的逻辑元件的图像。 所述图像识别装置包括:输出装置,用于当所述对应被识别或重新识别时输出所识别的逻辑元件; 第一确定装置,用于当从用户接收到确定逻辑元件的确定请求的输入时,确定某个逻辑元件是正确的; 第二确定装置,用于确定在由第一确定装置确定的逻辑元件之前输出的所有逻辑元件的正确性,并且根据用户的确认定位; 以及重新识别装置,用于基于每个逻辑元素的确定内容来重新识别尚未被确定为正确的逻辑元素与字符串之间的对应关系。

    Character recognition apparatus, character recognition method, and computer product
    38.
    发明申请
    Character recognition apparatus, character recognition method, and computer product 审中-公开
    字符识别装置,字符识别方法和计算机产品

    公开(公告)号:US20090041361A1

    公开(公告)日:2009-02-12

    申请号:US12153015

    申请日:2008-05-12

    IPC分类号: G06K9/62

    摘要: A character recognition apparatus includes a hash table registering unit and a recognition processing unit. The hash table registering unit creates a hash table indicating a characteristic of each of partial character images as an area of each character. The recognition processing unit divides an input image into partial input images, and calculates a characteristic of each partial input image. The recognition processing unit searches the hash table for a partial character image having a characteristic similar to that of each partial input image. The recognition processing unit compares a positional relationship of the partial input images with that of the partial character images to determine whether they match, and recognizes a character in each area of the input image.

    摘要翻译: 字符识别装置包括散列表登记单元和识别处理单元。 散列表注册单元创建表示每个部分字符图像的特征的散列表作为每个字符的区域。 识别处理单元将输入图像划分为部分输入图像,并计算每个部分输入图像的特性。 识别处理单元在哈希表中搜索具有与每个部分输入图像相似的特征的部分文字图像。 识别处理单元将部分输入图像的位置关系与部分字符图像的位置关系进行比较,以确定它们是否匹配,并且识别输入图像的每个区域中的字符。

    Apparatus and method for analyzing and determining correlation of information in a document
    39.
    发明申请
    Apparatus and method for analyzing and determining correlation of information in a document 有权
    用于分析和确定文档中信息的相关性的装置和方法

    公开(公告)号:US20080187240A1

    公开(公告)日:2008-08-07

    申请号:US12005527

    申请日:2007-12-27

    IPC分类号: G06K9/64

    CPC分类号: G06K9/00463

    摘要: According to an aspect of an embodiment, an apparatus for analyzing and determining correlation of information contained in a given form containing blocks, at least one of the blocks containing data indicative of a header, the rest of the blocks containing data in association with header information, comprising: a memory for storing templates having nodes, character data associated with said nodes respectively, and relative position information between said nodes; and a processor for analyzing and determining correlation of the information according to a process comprising: obtaining data contained in said blocks in the given form, determining relative position of said blocks to produce relative position information, analyzing the data obtained from the blocks and the relative position information of the blocks in comparison with the character data and the relative position information of said nodes of said templates, and determining correlation of the data contained in said blocks.

    摘要翻译: 根据实施例的一个方面,一种用于分析和确定包含在包含块的给定形式的信息的相关性的装置,所述块中的至少一个包含指示头部的数据,其余块包含与标题信息相关联的数据 包括:存储器,用于存储具有节点的模板,分别与所述节点相关联的字符数据以及所述节点之间的相对位置信息; 以及处理器,用于根据包括以下步骤的处理来分析和确定所述信息的相关性,所述处理包括:以给定形式获取包含在所述块中的数据,确定所述块的相对位置以产生相对位置信息,分析从块获得的数据和相对 与字符数据和所述模板的所述节点的相对位置信息相比较的块的位置信息,以及确定包含在所述块中的数据的相关性。