-
公开(公告)号:US20120243055A1
公开(公告)日:2012-09-27
申请号:US13242653
申请日:2011-09-23
IPC分类号: H04N1/40
CPC分类号: G06K9/00449 , G06K9/00483 , H04N1/00803 , H04N1/32128
摘要: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents. For documents of multiple pages, the method comprises maintaining a page-based coordinate system to specify a location of structures within a page and joining the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet. The method comprises performing a data extraction operation to extract data from each document, said data extraction operation including a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.
摘要翻译: 提供了一批处理扫描图像的方法。 该方法包括将扫描的图像处理成文档。 对于多页的文档,该方法包括维护基于页面的坐标系统以指定页面内的结构的位置并且连接页面以形成具有基于纸张的坐标系的多页面表格,以指定页面内的结构的位置 多页表。 该方法包括执行数据提取操作以从每个文档提取数据,所述数据提取操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中在整个文档内检测到结构使用 基于表的坐标系。
-
公开(公告)号:US08233714B2
公开(公告)日:2012-07-31
申请号:US12364266
申请日:2009-02-02
IPC分类号: G06K9/34
CPC分类号: G06F17/212 , G06K9/00469 , G06K9/2072 , G06K2209/01 , Y10S707/99933
摘要: A method related to data capture from forms involving optical character recognition comprises detecting data fields on a scanned image; generating a flexible document description based on the detected data fields, including creating a set of search elements for each data field, each search element having associated search criteria; and training the flexible document description using a search algorithm to detect the data fields on additional training images based on the set of search elements.
摘要翻译: 与涉及光学字符识别的形式的数据捕获相关的方法包括检测扫描图像上的数据字段; 基于检测到的数据字段生成灵活的文档描述,包括为每个数据字段创建一组搜索元素,每个搜索元素具有相关联的搜索准则; 并使用搜索算法训练灵活的文档描述,以基于搜索元素集来检测附加训练图像上的数据字段。
-
公开(公告)号:US20090175532A1
公开(公告)日:2009-07-09
申请号:US12364266
申请日:2009-02-02
CPC分类号: G06F17/212 , G06K9/00469 , G06K9/2072 , G06K2209/01 , Y10S707/99933
摘要: In one embodiment, the invention provides a method, comprising detecting data fields on a scanned image; generating a flexible document description based on the detected data fields, including creating a set of search elements for each data field, each search element having associated search criteria; and training the flexible document description using a search algorithm to detect the data fields on additional training images based on the set of search elements.
摘要翻译: 在一个实施例中,本发明提供一种方法,包括检测扫描图像上的数据字段; 基于检测到的数据字段生成灵活的文档描述,包括为每个数据字段创建一组搜索元素,每个搜索元素具有相关联的搜索准则; 并使用搜索算法训练灵活的文档描述,以基于搜索元素集来检测附加训练图像上的数据字段。
-
公开(公告)号:US09390321B2
公开(公告)日:2016-07-12
申请号:US13242653
申请日:2011-09-23
发明人: Diar Tuganbaev , Sergey Zlobin , Irina Filimonova
CPC分类号: G06K9/00449 , G06K9/00483 , H04N1/00803 , H04N1/32128
摘要: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents. For documents of multiple pages, the method comprises maintaining a page-based coordinate system to specify a location of structures within a page and joining the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet. The method comprises performing a data extraction operation to extract data from each document, said data extraction operation including a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.
摘要翻译: 提供了一批处理扫描图像的方法。 该方法包括将扫描的图像处理成文档。 对于多页的文档,该方法包括维护基于页面的坐标系统以指定页面内的结构的位置并且连接页面以形成具有基于纸张的坐标系的多页面表格,以指定页面内的结构的位置 多页表。 该方法包括执行数据提取操作以从每个文档提取数据,所述数据提取操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中在整个文档内检测到结构使用 基于表的坐标系。
-
公开(公告)号:US20100060947A1
公开(公告)日:2010-03-11
申请号:US12470425
申请日:2009-05-21
申请人: Diar Tuganbaev , Sergey Zlobin , Irina Filimonova
发明人: Diar Tuganbaev , Sergey Zlobin , Irina Filimonova
IPC分类号: H04N1/04
CPC分类号: H04N1/00795 , H04N1/00803 , H04N2101/00 , H04N2201/3216
摘要: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents; for documents comprising multiple pages maintaining a page-based coordinate system to specify a location of structures within a page and joining the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet; performing a data extraction operation to extract data from each document, said data extraction operation comprising a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.
摘要翻译: 提供了一批处理扫描图像的方法。 该方法包括将扫描图像处理成文档; 对于包括维持基于页面的坐标系的多个页面的文档来指定页面内的结构的位置并且连接页面以形成具有基于页面的坐标系的多页面表格,以指定多页面内的结构的位置 片; 执行数据提取操作以从每个文档提取数据,所述数据提取操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中在整个文档中使用页面 - 基于坐标系。
-
公开(公告)号:US08547589B2
公开(公告)日:2013-10-01
申请号:US12470425
申请日:2009-05-21
申请人: Diar Tuganbaev , Sergey Zlobin , Irina Filimonova
发明人: Diar Tuganbaev , Sergey Zlobin , Irina Filimonova
IPC分类号: G06K9/64
CPC分类号: H04N1/00795 , H04N1/00803 , H04N2101/00 , H04N2201/3216
摘要: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents; for documents comprising multiple pages maintaining a page-based coordinate system to specify a location of structures within a page and joining the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet; performing a data extraction operation to extract data from each document, said data extraction operation comprising a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.
摘要翻译: 提供了一批处理扫描图像的方法。 该方法包括将扫描图像处理成文档; 对于包括维持基于页面的坐标系的多个页面的文档来指定页面内的结构的位置并且连接页面以形成具有基于页面的坐标系的多页面表格,以指定多页面内的结构的位置 片; 执行数据提取操作以从每个文档提取数据,所述数据提取操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中在整个文档中使用页面 - 基于坐标系。
-
公开(公告)号:US20120183226A1
公开(公告)日:2012-07-19
申请号:US13431767
申请日:2012-03-27
申请人: Diar Tuganbaev , Sergey Zlobin , Irina Filimonova
发明人: Diar Tuganbaev , Sergey Zlobin , Irina Filimonova
IPC分类号: G06K9/46
CPC分类号: H04N1/00795 , H04N1/00803 , H04N2101/00 , H04N2201/3216
摘要: A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents. For documents comprising multiple pages, the method maintains a page-based coordinate system to specify a location of structures within a page and joins the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet. Data may be extracted from each document, such operation comprising a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.
摘要翻译: 提供了一批处理扫描图像的方法。 该方法包括将扫描的图像处理成文档。 对于包含多个页面的文档,该方法维护基于页面的坐标系统以指定页面内的结构的位置并且连接页面以形成具有基于页面的坐标系的多页面表格,以指定页面内的结构的位置 多页表。 可以从每个文档提取数据,这样的操作包括页面模式,其中使用基于页面的坐标系统在各个页面上检测结构,以及文档模式,其中使用基于片材的坐标系统在整个文档内检测结构。
-
公开(公告)号:US08171391B2
公开(公告)日:2012-05-01
申请号:US11556196
申请日:2006-11-03
CPC分类号: G06F17/30958
摘要: The proposed technical solution allows processing of machine-readable forms of unfixed format. It comprises a method of specifying the logical structure of a document characterized by: preliminary specification of the list and descriptions of varieties of elements which may be present in the form, specifying an algorithm of setting the search constraints for every element, description of at least the following characteristics of search for every simple or compound element—the spatial characteristics of the search area and the parametric characteristics of the element, description of the method of identification of obtained elements, testing the type of the element, testing the properties which are typical of the type, testing the completeness of composition of the parts of the element.
摘要翻译: 所提出的技术解决方案允许处理非固定格式的机器可读形式。 它包括指定文档的逻辑结构的方法,其特征在于:初始指定列表的形式和可以以形式存在的元素的种类的描述,指定为每个元素设置搜索约束的算法,至少描述 搜索每个简单或复合元素的特征 - 搜索区域的空间特征和元素的参数特征,所获取元素的识别方法的描述,测试元素的类型,测试典型的属性 的类型,测试元件部件的组成的完整性。
-
公开(公告)号:US20130322773A1
公开(公告)日:2013-12-05
申请号:US13963616
申请日:2013-08-09
IPC分类号: G06T7/00
CPC分类号: G06T7/0044 , G06F17/30244 , G06T7/74
摘要: Embodiments of the invention disclose techniques for processing of machine-readable forms of unfixed or flexible format. An auxiliary brief description may be optionally specified to determine the spatial orientation of the image. A method of searching for elements of a document comprises the following main operations in addition to the operations of preliminary image processing: selecting the varieties of structural description from several available variants, determining the orientation of the image, selecting the text objects, where the text must be recognized, and determining the minimal required volume of recognition, recognizing the text objects, searching for elements of the form. Searching for elements of the form comprises the following actions: selecting a searched element in the structural description, gaining the algorithm of search constraints from the structural description, searching for the element, testing the obtained variants.
摘要翻译: 本发明的实施例公开了用于处理未固定或灵活格式的机器可读形式的技术。 可以可选地指定辅助简要描述以确定图像的空间取向。 搜索文档的元素的方法除了初步图像处理的操作之外还包括以下主要操作:从几个可用变体中选择结构描述的品种,确定图像的取向,选择文本对象,其中文本 必须被识别,并确定最小所需的识别量,识别文本对象,搜索表单的元素。 搜索表单的元素包括以下动作:在结构描述中选择搜索到的元素,从结构描述中获取搜索约束的算法,搜索元素,测试获得的变体。
-
公开(公告)号:US08571262B2
公开(公告)日:2013-10-29
申请号:US12877954
申请日:2010-09-08
CPC分类号: G06T7/0044 , G06F17/30244 , G06T7/74
摘要: Embodiments of the invention disclose techniques for processing of machine-readable forms of unfixed or flexible format. An auxiliary brief description may be optionally specified to determine the spatial orientation of the image. A method of searching for elements of a document comprises the following main operations in addition to the operations of preliminary image processing: selecting the varieties of structural description from several available variants, determining the orientation of the image, selecting the text objects, where the text must be recognized, and determining the minimal required volume of recognition, recognizing the text objects, searching for elements of the form. Searching for elements of the form comprises the following actions: selecting a searched element in the structural description, gaining the algorithm of search constraints from the structural description, searching for the element, testing the obtained variants.
摘要翻译: 本发明的实施例公开了用于处理未固定或灵活格式的机器可读形式的技术。 可以可选地指定辅助简要描述以确定图像的空间取向。 搜索文档的元素的方法除了初步图像处理的操作之外还包括以下主要操作:从几个可用变体中选择结构描述的品种,确定图像的取向,选择文本对象,其中文本 必须被识别,并确定最小所需的识别量,识别文本对象,搜索表单的元素。 搜索表单的元素包括以下动作:在结构描述中选择搜索到的元素,从结构描述中获取搜索约束的算法,搜索元素,测试获得的变体。
-
-
-
-
-
-
-
-
-