Flexible Structure Descriptions for Multi-Page Documents
    1.
    发明申请
    Flexible Structure Descriptions for Multi-Page Documents 有权
    多页文档的灵活结构描述

    公开(公告)号:US20120243055A1

    公开(公告)日:2012-09-27

    申请号:US13242653

    申请日:2011-09-23

    IPC分类号: H04N1/40

    摘要: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents. For documents of multiple pages, the method comprises maintaining a page-based coordinate system to specify a location of structures within a page and joining the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet. The method comprises performing a data extraction operation to extract data from each document, said data extraction operation including a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.

    摘要翻译: 提供了一批处理扫描图像的方法。 该方法包括将扫描的图像处理成文档。 对于多页的文档,该方法包括维护基于页面的坐标系统以指定页面内的结构的位置并且连接页面以形成具有基于纸张的坐标系的多页面表格,以指定页面内的结构的位置 多页表。 该方法包括执行数据提取操作以从每个文档提取数据,所述数据提取操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中在整个文档内检测到结构使用 基于表的坐标系。

    Method and system for creating flexible structure descriptions
    2.
    发明授权
    Method and system for creating flexible structure descriptions 有权
    创建灵活结构描述的方法和系统

    公开(公告)号:US08233714B2

    公开(公告)日:2012-07-31

    申请号:US12364266

    申请日:2009-02-02

    IPC分类号: G06K9/34

    摘要: A method related to data capture from forms involving optical character recognition comprises detecting data fields on a scanned image; generating a flexible document description based on the detected data fields, including creating a set of search elements for each data field, each search element having associated search criteria; and training the flexible document description using a search algorithm to detect the data fields on additional training images based on the set of search elements.

    摘要翻译: 与涉及光学字符识别的形式的数据捕获相关的方法包括检测扫描图像上的数据字段; 基于检测到的数据字段生成灵活的文档描述,包括为每个数据字段创建一组搜索元素,每个搜索元素具有相关联的搜索准则; 并使用搜索算法训练灵活的文档描述,以基于搜索元素集来检测附加训练图像上的数据字段。

    Method and System for Creating Flexible Structure Descriptions
    3.
    发明申请
    Method and System for Creating Flexible Structure Descriptions 有权
    创建灵活结构描述的方法和系统

    公开(公告)号:US20090175532A1

    公开(公告)日:2009-07-09

    申请号:US12364266

    申请日:2009-02-02

    IPC分类号: G06K9/62 G06F3/048

    摘要: In one embodiment, the invention provides a method, comprising detecting data fields on a scanned image; generating a flexible document description based on the detected data fields, including creating a set of search elements for each data field, each search element having associated search criteria; and training the flexible document description using a search algorithm to detect the data fields on additional training images based on the set of search elements.

    摘要翻译: 在一个实施例中,本发明提供一种方法,包括检测扫描图像上的数据字段; 基于检测到的数据字段生成灵活的文档描述,包括为每个数据字段创建一组搜索元素,每个搜索元素具有相关联的搜索准则; 并使用搜索算法训练灵活的文档描述,以基于搜索元素集来检测附加训练图像上的数据字段。

    Flexible structure descriptions for multi-page documents
    4.
    发明授权
    Flexible structure descriptions for multi-page documents 有权
    灵活的多页文档结构描述

    公开(公告)号:US09390321B2

    公开(公告)日:2016-07-12

    申请号:US13242653

    申请日:2011-09-23

    IPC分类号: G06K9/00 H04N1/32 H04N1/00

    摘要: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents. For documents of multiple pages, the method comprises maintaining a page-based coordinate system to specify a location of structures within a page and joining the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet. The method comprises performing a data extraction operation to extract data from each document, said data extraction operation including a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.

    摘要翻译: 提供了一批处理扫描图像的方法。 该方法包括将扫描的图像处理成文档。 对于多页的文档,该方法包括维护基于页面的坐标系统以指定页面内的结构的位置并且连接页面以形成具有基于纸张的坐标系的多页面表格,以指定页面内的结构的位置 多页表。 该方法包括执行数据提取操作以从每个文档提取数据,所述数据提取操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中在整个文档内检测到结构使用 基于表的坐标系。

    DATA CAPTURE FROM MULTI-PAGE DOCUMENTS
    5.
    发明申请
    DATA CAPTURE FROM MULTI-PAGE DOCUMENTS 有权
    从多页文档获取数据

    公开(公告)号:US20100060947A1

    公开(公告)日:2010-03-11

    申请号:US12470425

    申请日:2009-05-21

    IPC分类号: H04N1/04

    摘要: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents; for documents comprising multiple pages maintaining a page-based coordinate system to specify a location of structures within a page and joining the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet; performing a data extraction operation to extract data from each document, said data extraction operation comprising a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.

    摘要翻译: 提供了一批处理扫描图像的方法。 该方法包括将扫描图像处理成文档; 对于包括维持基于页面的坐标系的多个页面的文档来指定页面内的结构的位置并且连接页面以形成具有基于页面的坐标系的多页面表格,以指定多页面内的结构的位置 片; 执行数据提取操作以从每个文档提取数据,所述数据提取操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中在整个文档中使用页面 - 基于坐标系。

    Data capture from multi-page documents
    6.
    发明授权
    Data capture from multi-page documents 有权
    从多页文档中获取数据

    公开(公告)号:US08547589B2

    公开(公告)日:2013-10-01

    申请号:US12470425

    申请日:2009-05-21

    IPC分类号: G06K9/64

    摘要: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents; for documents comprising multiple pages maintaining a page-based coordinate system to specify a location of structures within a page and joining the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet; performing a data extraction operation to extract data from each document, said data extraction operation comprising a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.

    摘要翻译: 提供了一批处理扫描图像的方法。 该方法包括将扫描图像处理成文档; 对于包括维持基于页面的坐标系的多个页面的文档来指定页面内的结构的位置并且连接页面以形成具有基于页面的坐标系的多页面表格,以指定多页面内的结构的位置 片; 执行数据提取操作以从每个文档提取数据,所述数据提取操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中在整个文档中使用页面 - 基于坐标系。

    DATA CAPTURE FROM MULTI-PAGE DOCUMENTS
    7.
    发明申请
    DATA CAPTURE FROM MULTI-PAGE DOCUMENTS 有权
    从多页文档获取数据

    公开(公告)号:US20120183226A1

    公开(公告)日:2012-07-19

    申请号:US13431767

    申请日:2012-03-27

    IPC分类号: G06K9/46

    摘要: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents. For documents comprising multiple pages, the method maintains a page-based coordinate system to specify a location of structures within a page and joins the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet. Data may be extracted from each document, such operation comprising a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.

    摘要翻译: 提供了一批处理扫描图像的方法。 该方法包括将扫描的图像处理成文档。 对于包含多个页面的文档,该方法维护基于页面的坐标系统以指定页面内的结构的位置并且连接页面以形成具有基于页面的坐标系的多页面表格,以指定页面内的结构的位置 多页表。 可以从每个文档提取数据,这样的操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中使用基于片材的坐标系统在整个文档内检测结构。

    Method for object recognition and describing structure of graphical objects
    8.
    发明授权
    Method for object recognition and describing structure of graphical objects 有权
    用于对象识别和描述图形对象结构的方法

    公开(公告)号:US09224040B2

    公开(公告)日:2015-12-29

    申请号:US13242218

    申请日:2011-09-23

    IPC分类号: G06F3/00 G06K9/00

    CPC分类号: G06K9/00463 G06K9/00456

    摘要: The invention involves a method for processing of machine-readable forms or documents of non-fixed format. The method makes use of, for example, a structural description of characteristics of document elements, a description of a logical structure of the document, and methods of searching for document elements by using the structural description. A structural description of the spatial and parametric characteristics of document elements and the logical connections between elements may include a hierarchical logical structure of the elements, specification of an algorithm of determining the search constraints, specification of characteristics of every searched element, and specification of a set of parameters for a compound element identified on the basis of the aggregate of its components. The method of describing the logical structure of a document and methods of searching for elements of a document may be based on the use of the structural description.

    摘要翻译: 本发明涉及一种用于处理非固定格式的机器可读形式或文档的方法。 该方法利用例如文档元素的特征的结构描述,文档的逻辑结构的描述以及通过使用结构描述搜索文档元素的方法。 文档元素的空间和参数特征以及元素之间的逻辑连接的结构描述可以包括元素的分层逻辑结构,确定搜索约束的算法的规范,每个搜索元素的特征的规范,以及 基于其组件的集合确定的复合元素的参数集合。 描述文档的逻辑结构的方法和搜索文档的元素的方法可以基于结构描述的使用。

    Method of describing the structure of graphical objects
    9.
    发明授权
    Method of describing the structure of graphical objects 有权
    描述图形对象结构的方法

    公开(公告)号:US08171391B2

    公开(公告)日:2012-05-01

    申请号:US11556196

    申请日:2006-11-03

    IPC分类号: G06F17/27 G06K9/34

    CPC分类号: G06F17/30958

    摘要: The proposed technical solution allows processing of machine-readable forms of unfixed format. It comprises a method of specifying the logical structure of a document characterized by: preliminary specification of the list and descriptions of varieties of elements which may be present in the form, specifying an algorithm of setting the search constraints for every element, description of at least the following characteristics of search for every simple or compound element—the spatial characteristics of the search area and the parametric characteristics of the element, description of the method of identification of obtained elements, testing the type of the element, testing the properties which are typical of the type, testing the completeness of composition of the parts of the element.

    摘要翻译: 所提出的技术解决方案允许处理非固定格式的机器可读形式。 它包括指定文档的逻辑结构的方法,其特征在于:初始指定列表的形式和可以以形式存在的元素的种类的描述,指定为每个元素设置搜索约束的算法,至少描述 搜索每个简单或复合元素的特征 - 搜索区域的空间特征和元素的参数特征,所获取元素的识别方法的描述,测试元素的类型,测试典型的属性 的类型,测试元件部件的组成的完整性。

    METHODS OF OBJECT SEARCH AND RECOGNITION
    10.
    发明申请
    METHODS OF OBJECT SEARCH AND RECOGNITION 有权
    对象搜索和识别方法

    公开(公告)号:US20130322773A1

    公开(公告)日:2013-12-05

    申请号:US13963616

    申请日:2013-08-09

    IPC分类号: G06T7/00

    摘要: Embodiments of the invention disclose techniques for processing of machine-readable forms of unfixed or flexible format. An auxiliary brief description may be optionally specified to determine the spatial orientation of the image. A method of searching for elements of a document comprises the following main operations in addition to the operations of preliminary image processing: selecting the varieties of structural description from several available variants, determining the orientation of the image, selecting the text objects, where the text must be recognized, and determining the minimal required volume of recognition, recognizing the text objects, searching for elements of the form. Searching for elements of the form comprises the following actions: selecting a searched element in the structural description, gaining the algorithm of search constraints from the structural description, searching for the element, testing the obtained variants.

    摘要翻译: 本发明的实施例公开了用于处理未固定或灵活格式的机器可读形式的技术。 可以可选地指定辅助简要描述以确定图像的空间取向。 搜索文档的元素的方法除了初步图像处理的操作之外还包括以下主要操作:从几个可用变体中选择结构描述的品种,确定图像的取向,选择文本对象,其中文本 必须被识别,并确定最小所需的识别量,识别文本对象,搜索表单的元素。 搜索表单的元素包括以下动作:在结构描述中选择搜索到的元素,从结构描述中获取搜索约束的算法,搜索元素,测试获得的变体。