摘要:
A system and method to classify forms. An image representing a form of an unknown document type is received. The image includes line-art. Further, a plurality of template models corresponding to a plurality of different document types is received. The plurality of different document types is intended to include the correct document type of the unknown document. A subset of the plurality of template models are selected as candidate template models. The candidate template models include line-art junctions best matching line-art junctions of the received image. One of the candidate template models is selected as a best candidate template model. The best candidate template model includes horizontal and vertical lines best matching horizontal and vertical lines of the received image, respectively, aligned to the best candidate template model.
摘要:
A system and method to classify forms. An image representing a form of an unknown document type is received. The image includes line-art. Further, a plurality of template models corresponding to a plurality of different document types is received. The plurality of different document types is intended to include the correct document type of the unknown document. A subset of the plurality of template models are selected as candidate template models. The candidate template models include line-art junctions best matching line-art junctions of the received image. One of the candidate template models is selected as a best candidate template model. The best candidate template model includes horizontal and vertical lines best matching horizontal and vertical lines of the received image, respectively, aligned to the best candidate template model.
摘要:
Methods of generating image anchor templates from low variance regions of document images of a first class are provided. The methods select a document image from the document images of the first class and align the other document images of the first class to the selected document image. Low variance regions are then determined by comparing the aligned document images and the selected document image and used to generate image anchor templates.
摘要:
Methods of generating image anchor templates from low variance regions of document images of a first class are provided. The methods select a document image from the document images of the first class and align the other document images of the first class to the selected document image. Low variance regions are then determined by comparing the aligned document images and the selected document image and used to generate image anchor templates.
摘要:
A method and system generates an idealized image of a form. An image of a form and a template model of the form are received. The form includes data fields. Word boxes of the image are identified. The word boxes are assigned to corresponding data fields of the form. An idealized image of the from is generated based on the assignments and the template model.
摘要:
In accordance with one aspect of the present invention, disclosed is an image analysis and conversion method and system, where digital ink images are converted to structured object representations of the digital ink images, capable of being edited by a structured text/graphics editor.
摘要:
A document recognition system and method, where images are represented as a collection of primitive features whose spatial relations are represented as a graph. Useful subsets of all the possible subgraphs representing different portions of images are represented over a corpus of many images. The data structure is a lattice of subgraphs, and algorithms are provided means to build and use the graph lattice efficiently and effectively.
摘要:
A method and system to localize data fields of a form. An image of a form is received, where the form includes data fields. Word boxes of the image are identified. The word boxes are grouped into candidate zones, where each of the candidate zones includes one or more of the word boxes. Hypotheses are formed from the data fields and the candidate zones, where each hypothesis assigns one of the candidate zones to one of the data fields or a null data field. A constrained optimization search of the hypotheses is performed for an optimal set of hypotheses. The optimal set of hypotheses assigns word box groups to corresponding data fields.
摘要:
A document recognition system and method, where images are represented as a collection of primitive features whose spatial relations are represented as a graph. Useful subsets of all the possible subgraphs representing different portions of images are represented over a corpus of many images. The data structure is a lattice of subgraphs, and algorithms are provided means to build and use the graph lattice efficiently and effectively.
摘要:
A system and method generate a graph lattice from exemplary images. At least one processor receives exemplary data graphs of the exemplary images and generates graph lattice nodes of size one from primitives. Until a termination condition is met, the at least one processor repeatedly: 1) generates candidate graph lattice nodes from accepted graph lattice nodes; 2) selects one or more candidate graph lattice nodes preferentially discriminating exemplary data graphs which are less discriminable than other exemplary data graphs using the accepted graph lattice nodes; and 3) promotes the selected graph lattice nodes to accepted status. The graph lattice is formed from the accepted graph lattice nodes and relations between the accepted graph lattice nodes.