摘要:
A system and method to classify forms. An image representing a form of an unknown document type is received. The image includes line-art. Further, a plurality of template models corresponding to a plurality of different document types is received. The plurality of different document types is intended to include the correct document type of the unknown document. A subset of the plurality of template models are selected as candidate template models. The candidate template models include line-art junctions best matching line-art junctions of the received image. One of the candidate template models is selected as a best candidate template model. The best candidate template model includes horizontal and vertical lines best matching horizontal and vertical lines of the received image, respectively, aligned to the best candidate template model.
摘要:
A method of classifying an input image includes the initial steps of labeling an input image in accordance with a class and extracting at least one connected component from the input image. The method also includes the steps of calculating at least one feature of the input image and generating a model based on the at least one calculated feature. The method also includes the steps of repeating at least one of the previous steps for at least one other input image and comparing the at least one other input image with the model. The at least one other input image is classified in accordance with the class of the model if the at least one calculated feature of the at least one other input image is substantially similar to that of the model.
摘要:
A computer-implemented method and system for reconstructing a clean document from annotated document images and/or extracting annotations therefrom are provided. The method includes receiving a set of at least two annotated document images into computer memory, selecting a representative image from the set of annotated document images, performing a global alignment on each of the set of annotated document images with respect to the selected representative image, and forming a consensus document image based at least on the aligned annotated document images. A clean document based at least on the consensus document image is then formed which can be used for extracting the annotations.
摘要:
A system and method to classify forms. An image representing a form of an unknown document type is received. The image includes line-art. Further, a plurality of template models corresponding to a plurality of different document types is received. The plurality of different document types is intended to include the correct document type of the unknown document. A subset of the plurality of template models are selected as candidate template models. The candidate template models include line-art junctions best matching line-art junctions of the received image. One of the candidate template models is selected as a best candidate template model. The best candidate template model includes horizontal and vertical lines best matching horizontal and vertical lines of the received image, respectively, aligned to the best candidate template model.
摘要:
A computer-implemented method and system for reconstructing a clean document from annotated document images and/or extracting annotations therefrom are provided. The method includes receiving a set of at least two annotated document images into computer memory, selecting a representative image from the set of annotated document images, performing a global alignment on each of the set of annotated document images with respect to the selected representative image, and forming a consensus document image based at least on the aligned annotated document images. A clean document based at least on the consensus document image is then formed which can be used for extracting the annotations.
摘要:
Methods of generating image anchor templates from low variance regions of document images of a first class are provided. The methods select a document image from the document images of the first class and align the other document images of the first class to the selected document image. Low variance regions are then determined by comparing the aligned document images and the selected document image and used to generate image anchor templates.
摘要:
Methods of generating image anchor templates from low variance regions of document images of a first class are provided. The methods select a document image from the document images of the first class and align the other document images of the first class to the selected document image. Low variance regions are then determined by comparing the aligned document images and the selected document image and used to generate image anchor templates.
摘要:
A method of classifying an input image includes the initial steps of labeling an input image in accordance with a class and extracting at least one connected component from the input image. The method also includes the steps of calculating at least one feature of the input image and generating a model based on the at least one calculated feature. The method also includes the steps of repeating at least one of the previous steps for at least one other input image and comparing the at least one other input image with the model. The at least one other input image is classified in accordance with the class of the model if the at least one calculated feature of the at least one other input image is substantially similar to that of the model.