-
公开(公告)号:US20120243055A1
公开(公告)日:2012-09-27
申请号:US13242653
申请日:2011-09-23
Applicant: Diar Tuganbaev , Marinos Dimosthenos , Sergey Zlobin , Irina Filimonova
Inventor: Diar Tuganbaev , Marinos Dimosthenos , Sergey Zlobin , Irina Filimonova
IPC: H04N1/40
CPC classification number: G06K9/00449 , G06K9/00483 , H04N1/00803 , H04N1/32128
Abstract: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents. For documents of multiple pages, the method comprises maintaining a page-based coordinate system to specify a location of structures within a page and joining the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet. The method comprises performing a data extraction operation to extract data from each document, said data extraction operation including a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.
Abstract translation: 提供了一批处理扫描图像的方法。 该方法包括将扫描的图像处理成文档。 对于多页的文档,该方法包括维护基于页面的坐标系统以指定页面内的结构的位置并且连接页面以形成具有基于纸张的坐标系的多页面表格,以指定页面内的结构的位置 多页表。 该方法包括执行数据提取操作以从每个文档提取数据,所述数据提取操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中在整个文档内检测到结构使用 基于表的坐标系。
-
2.
公开(公告)号:US08233714B2
公开(公告)日:2012-07-31
申请号:US12364266
申请日:2009-02-02
Applicant: Konstantin Zuev , Diar Tuganbaev , Irina Filimonova , Sergey Zlobin
Inventor: Konstantin Zuev , Diar Tuganbaev , Irina Filimonova , Sergey Zlobin
IPC: G06K9/34
CPC classification number: G06F17/212 , G06K9/00469 , G06K9/2072 , G06K2209/01 , Y10S707/99933
Abstract: A method related to data capture from forms involving optical character recognition comprises detecting data fields on a scanned image; generating a flexible document description based on the detected data fields, including creating a set of search elements for each data field, each search element having associated search criteria; and training the flexible document description using a search algorithm to detect the data fields on additional training images based on the set of search elements.
Abstract translation: 与涉及光学字符识别的形式的数据捕获相关的方法包括检测扫描图像上的数据字段; 基于检测到的数据字段生成灵活的文档描述,包括为每个数据字段创建一组搜索元素,每个搜索元素具有相关联的搜索准则; 并使用搜索算法训练灵活的文档描述,以基于搜索元素集来检测附加训练图像上的数据字段。
-
3.
公开(公告)号:US20090175532A1
公开(公告)日:2009-07-09
申请号:US12364266
申请日:2009-02-02
Applicant: Konstantin Zuev , Diar Tuganbaev , Irina Filimonova , Sergey Zlobin
Inventor: Konstantin Zuev , Diar Tuganbaev , Irina Filimonova , Sergey Zlobin
CPC classification number: G06F17/212 , G06K9/00469 , G06K9/2072 , G06K2209/01 , Y10S707/99933
Abstract: In one embodiment, the invention provides a method, comprising detecting data fields on a scanned image; generating a flexible document description based on the detected data fields, including creating a set of search elements for each data field, each search element having associated search criteria; and training the flexible document description using a search algorithm to detect the data fields on additional training images based on the set of search elements.
Abstract translation: 在一个实施例中,本发明提供一种方法,包括检测扫描图像上的数据字段; 基于检测到的数据字段生成灵活的文档描述,包括为每个数据字段创建一组搜索元素,每个搜索元素具有相关联的搜索准则; 并使用搜索算法训练灵活的文档描述,以基于搜索元素集来检测附加训练图像上的数据字段。
-
公开(公告)号:US09390321B2
公开(公告)日:2016-07-12
申请号:US13242653
申请日:2011-09-23
Applicant: Diar Tuganbaev , Marinos Dimostheons , Sergey Zlobin , Irina Filimonova
Inventor: Diar Tuganbaev , Sergey Zlobin , Irina Filimonova
CPC classification number: G06K9/00449 , G06K9/00483 , H04N1/00803 , H04N1/32128
Abstract: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents. For documents of multiple pages, the method comprises maintaining a page-based coordinate system to specify a location of structures within a page and joining the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet. The method comprises performing a data extraction operation to extract data from each document, said data extraction operation including a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.
Abstract translation: 提供了一批处理扫描图像的方法。 该方法包括将扫描的图像处理成文档。 对于多页的文档,该方法包括维护基于页面的坐标系统以指定页面内的结构的位置并且连接页面以形成具有基于纸张的坐标系的多页面表格,以指定页面内的结构的位置 多页表。 该方法包括执行数据提取操作以从每个文档提取数据,所述数据提取操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中在整个文档内检测到结构使用 基于表的坐标系。
-
公开(公告)号:US20100060947A1
公开(公告)日:2010-03-11
申请号:US12470425
申请日:2009-05-21
Applicant: Diar Tuganbaev , Sergey Zlobin , Irina Filimonova
Inventor: Diar Tuganbaev , Sergey Zlobin , Irina Filimonova
IPC: H04N1/04
CPC classification number: H04N1/00795 , H04N1/00803 , H04N2101/00 , H04N2201/3216
Abstract: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents; for documents comprising multiple pages maintaining a page-based coordinate system to specify a location of structures within a page and joining the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet; performing a data extraction operation to extract data from each document, said data extraction operation comprising a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.
Abstract translation: 提供了一批处理扫描图像的方法。 该方法包括将扫描图像处理成文档; 对于包括维持基于页面的坐标系的多个页面的文档来指定页面内的结构的位置并且连接页面以形成具有基于页面的坐标系的多页面表格,以指定多页面内的结构的位置 片; 执行数据提取操作以从每个文档提取数据,所述数据提取操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中在整个文档中使用页面 - 基于坐标系。
-
公开(公告)号:US08547589B2
公开(公告)日:2013-10-01
申请号:US12470425
申请日:2009-05-21
Applicant: Diar Tuganbaev , Sergey Zlobin , Irina Filimonova
Inventor: Diar Tuganbaev , Sergey Zlobin , Irina Filimonova
IPC: G06K9/64
CPC classification number: H04N1/00795 , H04N1/00803 , H04N2101/00 , H04N2201/3216
Abstract: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents; for documents comprising multiple pages maintaining a page-based coordinate system to specify a location of structures within a page and joining the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet; performing a data extraction operation to extract data from each document, said data extraction operation comprising a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.
Abstract translation: 提供了一批处理扫描图像的方法。 该方法包括将扫描图像处理成文档; 对于包括维持基于页面的坐标系的多个页面的文档来指定页面内的结构的位置并且连接页面以形成具有基于页面的坐标系的多页面表格,以指定多页面内的结构的位置 片; 执行数据提取操作以从每个文档提取数据,所述数据提取操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中在整个文档中使用页面 - 基于坐标系。
-
公开(公告)号:US20120183226A1
公开(公告)日:2012-07-19
申请号:US13431767
申请日:2012-03-27
Applicant: Diar Tuganbaev , Sergey Zlobin , Irina Filimonova
Inventor: Diar Tuganbaev , Sergey Zlobin , Irina Filimonova
IPC: G06K9/46
CPC classification number: H04N1/00795 , H04N1/00803 , H04N2101/00 , H04N2201/3216
Abstract: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents. For documents comprising multiple pages, the method maintains a page-based coordinate system to specify a location of structures within a page and joins the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet. Data may be extracted from each document, such operation comprising a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.
Abstract translation: 提供了一批处理扫描图像的方法。 该方法包括将扫描的图像处理成文档。 对于包含多个页面的文档,该方法维护基于页面的坐标系统以指定页面内的结构的位置并且连接页面以形成具有基于页面的坐标系的多页面表格,以指定页面内的结构的位置 多页表。 可以从每个文档提取数据,这样的操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中使用基于片材的坐标系统在整个文档内检测结构。
-
公开(公告)号:US08538162B2
公开(公告)日:2013-09-17
申请号:US13431767
申请日:2012-03-27
Applicant: Diar Tuganbaev , Maryana Skuratovskaya , Sergey Zlobin
Inventor: Diar Tuganbaev , Maryana Skuratovskaya , Sergey Zlobin
IPC: G06K9/46
CPC classification number: H04N1/00795 , H04N1/00803 , H04N2101/00 , H04N2201/3216
Abstract: A method for processing a batch of scanned images is disclosed. The method includes processing the scanned images into documents. For documents of multiple pages, the method maintains a page-based coordinate system to specify a location of structures within a page and joins the pages to form a multi-page sheet associated with a sheet-based coordinate system to specify a location of structures within the multi-page sheet. Data may be extracted from each document through a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.
Abstract translation: 公开了一种用于处理一批扫描图像的方法。 该方法包括将扫描的图像处理成文档。 对于多页的文档,该方法维护基于页面的坐标系,以指定页面内的结构的位置并加入页面以形成与基于页面的坐标系相关联的多页表格,以指定结构的位置 多页表。 可以通过页面模式从每个文档中提取数据,其中使用基于页面的坐标系在各个页面上检测结构,并且使用基于纸张的坐标系统在整个文档内检测结构的文档模式。
-
公开(公告)号:US08908969B2
公开(公告)日:2014-12-09
申请号:US13562791
申请日:2012-07-31
Applicant: Konstantin Zuev , Irina Filimonova , Sergey Zlobin , Maryana Skuratovskaya
Inventor: Konstantin Zuev , Diar Tuganbaev , Irina Filimonova , Sergey Zlobin
CPC classification number: G06F17/212 , G06K9/00469 , G06K9/2072 , G06K2209/01 , Y10S707/99933
Abstract: In one embodiment, the invention provides a method, comprising detecting data fields on a scanned document image; generating a flexible document description based on the detected data fields, including creating a set of search elements for each data field, each search element having associated search criteria; and training or modifying the flexible document description using, for example, a search algorithm to detect the data fields on additional training images based on the set of search elements.
Abstract translation: 在一个实施例中,本发明提供一种方法,包括检测扫描的文档图像上的数据字段; 基于检测到的数据字段生成灵活的文档描述,包括为每个数据字段创建一组搜索元素,每个搜索元素具有相关联的搜索准则; 以及使用例如搜索算法来训练或修改柔性文档描述,以基于搜索元素集来检测附加训练图像上的数据字段。
-
10.
公开(公告)号:US08805093B2
公开(公告)日:2014-08-12
申请号:US12977016
申请日:2010-12-22
Applicant: Konstantin Zuev , Irina Filimonova , Sergey Zlobin
Inventor: Konstantin Zuev , Irina Filimonova , Sergey Zlobin
CPC classification number: G06K9/00469 , G06K9/00449 , G06K9/46 , G06K9/6202 , G06K9/626 , G06K2209/01
Abstract: In one embodiment, the invention provides a method for a machine to perform machine-readable form pre-recognition analysis. The method comprises preliminarily assigning at least one graphic image in a form for identification of form type, preliminarily creating at least one model of the said graphic image for identification of the form type, parsing a form image into regions, determining an image form type for the form image, comprising: (a) detecting on the form image at least one of said graphic images for identification of the form type, (b) performing a primary identification of the form image type based on a comparison of the detected graphic image with the said model, and(c) performing a profound analysis using a supplementary data said-primary identification results in multiple possibilities for the form image type.
Abstract translation: 在一个实施例中,本发明提供了一种用于机器执行机器可读形式预识别分析的方法。 该方法包括以形式类型的形式预先分配至少一个图形图像,预先创建所述图形图像的至少一个模型以识别形式类型,将形式图像解析为区域,确定图像形式类型 所述形式图像包括:(a)在形式图像上检测至少一个所述图形图像以识别形式类型,(b)基于检测到的图形图像与 所述模型,以及(c)使用补充数据进行深刻分析,所述主要识别导致形式图像类型的多种可能性。
-
-
-
-
-
-
-
-
-