Creating flexible structure descriptions
    1.
    发明授权
    Creating flexible structure descriptions 有权
    创建灵活的结构描述

    公开(公告)号:US08908969B2

    公开(公告)日:2014-12-09

    申请号:US13562791

    申请日:2012-07-31

    Abstract: In one embodiment, the invention provides a method, comprising detecting data fields on a scanned document image; generating a flexible document description based on the detected data fields, including creating a set of search elements for each data field, each search element having associated search criteria; and training or modifying the flexible document description using, for example, a search algorithm to detect the data fields on additional training images based on the set of search elements.

    Abstract translation: 在一个实施例中,本发明提供一种方法,包括检测扫描的文档图像上的数据字段; 基于检测到的数据字段生成灵活的文档描述,包括为每个数据字段创建一组搜索元素,每个搜索元素具有相关联的搜索准则; 以及使用例如搜索算法来训练或修改柔性文档描述,以基于搜索元素集来检测附加训练图像上的数据字段。

    Method of pre-analysis of a machine-readable form image
    2.
    发明授权
    Method of pre-analysis of a machine-readable form image 有权
    机器可读形式图像的预分析方法

    公开(公告)号:US08805093B2

    公开(公告)日:2014-08-12

    申请号:US12977016

    申请日:2010-12-22

    Abstract: In one embodiment, the invention provides a method for a machine to perform machine-readable form pre-recognition analysis. The method comprises preliminarily assigning at least one graphic image in a form for identification of form type, preliminarily creating at least one model of the said graphic image for identification of the form type, parsing a form image into regions, determining an image form type for the form image, comprising: (a) detecting on the form image at least one of said graphic images for identification of the form type, (b) performing a primary identification of the form image type based on a comparison of the detected graphic image with the said model, and(c) performing a profound analysis using a supplementary data said-primary identification results in multiple possibilities for the form image type.

    Abstract translation: 在一个实施例中,本发明提供了一种用于机器执行机器可读形式预识别分析的方法。 该方法包括以形式类型的形式预先分配至少一个图形图像,预先创建所述图形图像的至少一个模型以识别形式类型,将形式图像解析为区域,确定图像形式类型 所述形式图像包括:(a)在形式图像上检测至少一个所述图形图像以识别形式类型,(b)基于检测到的图形图像与 所述模型,以及(c)使用补充数据进行深刻分析,所述主要识别导致形式图像类型的多种可能性。

    Method and system of pre-analysis and automated classification of documents

    公开(公告)号:US09633257B2

    公开(公告)日:2017-04-25

    申请号:US14314892

    申请日:2014-06-25

    Abstract: Automatic classification of different types of documents is disclosed. An image of a form or document is captured. The document is assigned to one or more type definitions by identifying one or more objects within the image of the document. A matching model is selected via identification of the document image. In the case of multiple identifications, a profound analysis of the document type is performed—either automatically or manually. An automatic classifier may be trained with document samples of each of a plurality of document classes or document types where the types are known in advance or a system of classes may be formed automatically without a priori information about types of samples. An automatic classifier determines possible features and calculates a range of feature values and possible other feature parameters for each type or class of document. A decision tree, based on rules specified by a user, may be used for classifying documents. Processing, such as optical character recognition (OCR), may be used in the classification process.

    METHOD AND SYSTEM OF PRE-ANALYSIS AND AUTOMATED CLASSIFICATION OF DOCUMENTS
    4.
    发明申请
    METHOD AND SYSTEM OF PRE-ANALYSIS AND AUTOMATED CLASSIFICATION OF DOCUMENTS 审中-公开
    预分析方法与系统及文件自动分类

    公开(公告)号:US20140307959A1

    公开(公告)日:2014-10-16

    申请号:US14314892

    申请日:2014-06-25

    Abstract: Automatic classification of different types of documents is disclosed. An image of a form or document is captured. The document is assigned to one or more type definitions by identifying one or more objects within the image of the document. A matching model is selected via identification of the document image. In the case of multiple identifications, a profound analysis of the document type is performed—either automatically or manually. An automatic classifier may be trained with document samples of each of a plurality of document classes or document types where the types are known in advance or a system of classes may be formed automatically without a priori information about types of samples. An automatic classifier determines possible features and calculates a range of feature values and possible other feature parameters for each type or class of document. A decision tree, based on rules specified by a user, may be used for classifying documents. Processing, such as optical character recognition (OCR), may be used in the classification process.

    Abstract translation: 披露不同类型文件的自动分类。 捕获表单或文档的图像。 通过识别文档图像中的一个或多个对象,将文档分配给一个或多个类型定义。 通过文档图像的识别来选择匹配模型。 在多个标识的情况下,对文档类型进行深入分析,可以自动或手动执行。 可以使用预先知道类型的多个文档类别或文档类型中的每一个的文档样本来训练自动分类器,或者可以自动形成类的系统而不关于样本类型的先验信息。 自动分类器确定可能的特征,并计算每种类型或类别的文档的特征值范围和可能的其他特征参数。 基于用户指定的规则的决策树可以用于对文档进行分类。 在分类过程中可以使用诸如光学字符识别(OCR)的处理。

    Data capture from multi-page documents
    5.
    发明授权
    Data capture from multi-page documents 有权
    从多页文档中获取数据

    公开(公告)号:US08547589B2

    公开(公告)日:2013-10-01

    申请号:US12470425

    申请日:2009-05-21

    CPC classification number: H04N1/00795 H04N1/00803 H04N2101/00 H04N2201/3216

    Abstract: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents; for documents comprising multiple pages maintaining a page-based coordinate system to specify a location of structures within a page and joining the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet; performing a data extraction operation to extract data from each document, said data extraction operation comprising a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.

    Abstract translation: 提供了一批处理扫描图像的方法。 该方法包括将扫描图像处理成文档; 对于包括维持基于页面的坐标系的多个页面的文档来指定页面内的结构的位置并且连接页面以形成具有基于页面的坐标系的多页面表格,以指定多页面内的结构的位置 片; 执行数据提取操作以从每个文档提取数据,所述数据提取操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中在整个文档中使用页面 - 基于坐标系。

    DATA CAPTURE FROM MULTI-PAGE DOCUMENTS
    6.
    发明申请
    DATA CAPTURE FROM MULTI-PAGE DOCUMENTS 有权
    从多页文档获取数据

    公开(公告)号:US20120183226A1

    公开(公告)日:2012-07-19

    申请号:US13431767

    申请日:2012-03-27

    CPC classification number: H04N1/00795 H04N1/00803 H04N2101/00 H04N2201/3216

    Abstract: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents. For documents comprising multiple pages, the method maintains a page-based coordinate system to specify a location of structures within a page and joins the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet. Data may be extracted from each document, such operation comprising a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.

    Abstract translation: 提供了一批处理扫描图像的方法。 该方法包括将扫描的图像处理成文档。 对于包含多个页面的文档,该方法维护基于页面的坐标系统以指定页面内的结构的位置并且连接页面以形成具有基于页面的坐标系的多页面表格,以指定页面内的结构的位置 多页表。 可以从每个文档提取数据,这样的操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中使用基于片材的坐标系统在整个文档内检测结构。

    Flexible structure descriptions for multi-page documents
    7.
    发明授权
    Flexible structure descriptions for multi-page documents 有权
    灵活的多页文档结构描述

    公开(公告)号:US09390321B2

    公开(公告)日:2016-07-12

    申请号:US13242653

    申请日:2011-09-23

    CPC classification number: G06K9/00449 G06K9/00483 H04N1/00803 H04N1/32128

    Abstract: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents. For documents of multiple pages, the method comprises maintaining a page-based coordinate system to specify a location of structures within a page and joining the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet. The method comprises performing a data extraction operation to extract data from each document, said data extraction operation including a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.

    Abstract translation: 提供了一批处理扫描图像的方法。 该方法包括将扫描的图像处理成文档。 对于多页的文档,该方法包括维护基于页面的坐标系统以指定页面内的结构的位置并且连接页面以形成具有基于纸张的坐标系的多页面表格,以指定页面内的结构的位置 多页表。 该方法包括执行数据提取操作以从每个文档提取数据,所述数据提取操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中在整个文档内检测到结构使用 基于表的坐标系。

    Data capture from multi-page documents
    8.
    发明授权
    Data capture from multi-page documents 有权
    从多页文档中获取数据

    公开(公告)号:US08538162B2

    公开(公告)日:2013-09-17

    申请号:US13431767

    申请日:2012-03-27

    CPC classification number: H04N1/00795 H04N1/00803 H04N2101/00 H04N2201/3216

    Abstract: A method for processing a batch of scanned images is disclosed. The method includes processing the scanned images into documents. For documents of multiple pages, the method maintains a page-based coordinate system to specify a location of structures within a page and joins the pages to form a multi-page sheet associated with a sheet-based coordinate system to specify a location of structures within the multi-page sheet. Data may be extracted from each document through a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.

    Abstract translation: 公开了一种用于处理一批扫描图像的方法。 该方法包括将扫描的图像处理成文档。 对于多页的文档,该方法维护基于页面的坐标系,以指定页面内的结构的位置并加入页面以形成与基于页面的坐标系相关联的多页表格,以指定结构的位置 多页表。 可以通过页面模式从每个文档中提取数据,其中使用基于页面的坐标系在各个页面上检测结构,并且使用基于纸张的坐标系统在整个文档内检测结构的文档模式。

    Creating Flexible Structure Descriptions
    9.
    发明申请
    Creating Flexible Structure Descriptions 有权
    创建灵活的结构描述

    公开(公告)号:US20130198615A1

    公开(公告)日:2013-08-01

    申请号:US13562791

    申请日:2012-07-31

    Abstract: In one embodiment, the invention provides a method, comprising detecting data fields on a scanned document image; generating a flexible document description based on the detected data fields, including creating a set of search elements for each data field, each search element having associated search criteria; and training or modifying the flexible document description using, for example, a search algorithm to detect the data fields on additional training images based on the set of search elements.

    Abstract translation: 在一个实施例中,本发明提供一种方法,包括检测扫描的文档图像上的数据字段; 基于检测到的数据字段生成灵活的文档描述,包括为每个数据字段创建一组搜索元素,每个搜索元素具有相关联的搜索准则; 以及使用例如搜索算法来训练或修改柔性文档描述,以基于搜索元素集来检测附加训练图像上的数据字段。

    Method and System of Pre-Analysis and Automated Classification of Documents
    10.
    发明申请
    Method and System of Pre-Analysis and Automated Classification of Documents 审中-公开
    文件预分析和自动分类方法与系统

    公开(公告)号:US20110188759A1

    公开(公告)日:2011-08-04

    申请号:US13087242

    申请日:2011-04-14

    Abstract: Automatic classification of different types of documents is disclosed. An image of a form or document is captured. The document is assigned to one or more type definitions by identifying one or more objects within the image of the document. A matching model is selected via identification of the document image. In the case of multiple identifications, a profound analysis of the document type is performed—either automatically or manually. An automatic classifier may be trained with document samples of each of a plurality of document classes or document types where the types are known in advance or a system of classes may be formed automatically without a priori information about types of samples. An automatic classifier determines possible features and calculates a range of feature values and possible other feature parameters for each type or class of document. A decision tree, based on rules specified by a user, may be used for classifying documents. Processing, such as optical character recognition (OCR), may be used in the classification process.

    Abstract translation: 披露不同类型文件的自动分类。 捕获表单或文档的图像。 通过识别文档图像中的一个或多个对象,将文档分配给一个或多个类型定义。 通过文档图像的识别来选择匹配模型。 在多个标识的情况下,对文档类型进行深入分析,可以自动或手动执行。 可以使用预先知道类型的多个文档类别或文档类型中的每一个的文档样本来训练自动分类器,或者可以自动形成类的系统而不关于样本类型的先验信息。 自动分类器确定可能的特征,并计算每种类型或类别的文档的特征值范围和可能的其他特征参数。 基于用户指定的规则的决策树可以用于对文档进行分类。 在分类过程中可以使用诸如光学字符识别(OCR)的处理。

Patent Agency Ranking