摘要:
A method related to data capture from forms involving optical character recognition comprises detecting data fields on a scanned image; generating a flexible document description based on the detected data fields, including creating a set of search elements for each data field, each search element having associated search criteria; and training the flexible document description using a search algorithm to detect the data fields on additional training images based on the set of search elements.
摘要:
In one embodiment, the invention provides a method, comprising detecting data fields on a scanned image; generating a flexible document description based on the detected data fields, including creating a set of search elements for each data field, each search element having associated search criteria; and training the flexible document description using a search algorithm to detect the data fields on additional training images based on the set of search elements.
摘要:
The proposed technical solution allows processing of machine-readable forms of unfixed format. It comprises a method of specifying the logical structure of a document characterized by: preliminary specification of the list and descriptions of varieties of elements which may be present in the form, specifying an algorithm of setting the search constraints for every element, description of at least the following characteristics of search for every simple or compound element—the spatial characteristics of the search area and the parametric characteristics of the element, description of the method of identification of obtained elements, testing the type of the element, testing the properties which are typical of the type, testing the completeness of composition of the parts of the element.
摘要:
Embodiments of the invention disclose techniques for processing of machine-readable forms of unfixed or flexible format. An auxiliary brief description may be optionally specified to determine the spatial orientation of the image. A method of searching for elements of a document comprises the following main operations in addition to the operations of preliminary image processing: selecting the varieties of structural description from several available variants, determining the orientation of the image, selecting the text objects, where the text must be recognized, and determining the minimal required volume of recognition, recognizing the text objects, searching for elements of the form. Searching for elements of the form comprises the following actions: selecting a searched element in the structural description, gaining the algorithm of search constraints from the structural description, searching for the element, testing the obtained variants.
摘要:
Embodiments of the invention disclose techniques for processing of machine-readable forms of unfixed or flexible format. An auxiliary brief description may be optionally specified to determine the spatial orientation of the image. A method of searching for elements of a document comprises the following main operations in addition to the operations of preliminary image processing: selecting the varieties of structural description from several available variants, determining the orientation of the image, selecting the text objects, where the text must be recognized, and determining the minimal required volume of recognition, recognizing the text objects, searching for elements of the form. Searching for elements of the form comprises the following actions: selecting a searched element in the structural description, gaining the algorithm of search constraints from the structural description, searching for the element, testing the obtained variants.
摘要:
Methods for processing machine-readable forms or documents of non-fixed format are disclosed. The methods make use of, for example, a structural description of characteristics of document elements, a description of a logical structure of the document, and methods of searching for document elements by using the structural description. A structural description of the spatial and parametric characteristics of document elements and the logical connections between elements may include a hierarchical logical structure of the elements, specification of an algorithm of determining the search constraints, specification of characteristics of searched elements, and specification of a set of parameters for a compound element identified on the basis of the aggregate of its components. The method of describing the logical structure of a document and methods of searching for elements of a document may be based on the use of the structural description.
摘要:
Embodiments of the invention disclose techniques for processing of machine-readable forms of unfixed or flexible format. An auxiliary brief description may be optionally specified to determine the spatial orientation of the image. A method of searching for elements of a document comprises the following main operations in addition to the operations of preliminary image processing: selecting the varieties of structural description from several available variants, determining the orientation of the image, selecting the text objects, where the text must be recognized, and determining the minimal required volume of recognition, recognizing the text objects, searching for elements of the form. Searching for elements of the form comprises the following actions: selecting a searched element in the structural description, gaining the algorithm of search constraints from the structural description, searching for the element, testing the obtained variants.
摘要:
The proposed technical solution allows processing of machine-readable forms of unfixed format. An auxiliary brief description may be optionally specified to determine the spatial orientation of the image. A method of searching for elements of a document comprises the following main operations in addition to the operations of preliminary image processing: selecting the varieties of structural description from several available variants, determining the orientation of the image, selecting the text objects, where the text must be recognized, and determining the minimal required volume of recognition, recognizing the text objects, searching for elements of the form. Searching for elements of the form comprises the following actions: selecting a searched element in the structural description, gaining the algorithm of search constraints from the structural description, searching for the element, testing the obtained variants.
摘要:
The invention deals with the processing of machine-readable forms of non-fixed format. It comprises the structural description of characteristics of a document elements, a method of describing the logical structure of a document, methods of searching for elements of a document with the use of the structural description. A structural description of the spatial, parametric characteristics of document elements and the logical connections between elements comprises the hierarchical logical structure of the elements, specification of an algorithm of determining the search constraints, specification of every searched element characteristics, specification of the parameters set for a compound element identification on the basis of the aggregate of its components. The method of describing the logical structure of a document and methods of searching for elements of a document are based on the use of the structural description.
摘要:
A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents. For documents of multiple pages, the method comprises maintaining a page-based coordinate system to specify a location of structures within a page and joining the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet. The method comprises performing a data extraction operation to extract data from each document, said data extraction operation including a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.