Abstract:
A method for authenticating a printed document which carries barcode that encode authentication data, including word bounding boxes for each word in the original document image and data for reconstructing the original image. The printed document is scanned to generate a target document image, which is then segmented into text words. The word bounding boxes of the original and target document images are used to align the target document image. Then, each word in the original document image is compared to corresponding words in the target document image using word difference map and Hausdorff distance between them. Symbols of the original document image are further compared to corresponding symbols in the target document image using feature comparison, symbol difference map and Hausdorff distance comparison, and point matching. These various comparison results can identify alterations in the target document with respect to the original document, which can be visualized.
Abstract:
A method for compressing a bi-level document image containing text is disclosed. The document image is segmented into symbol images each representing a letter, numeral, etc. in the document. The symbol images are classified into a plurality of classes, each class being associated with a template image and a class index. Classification is done by comparing each symbol to be classified with template of existing classes, using a number of image features including zoning profiles, side profiles, topology statistics, and low-order image moments. These image features are compared using a tolerance based method to determine whether the symbol matches the template. After classification, certain classes that have few symbols classified into them may be merged with other classes. In addition, the template images of the classes are down-sampled, where the final sizes of the template images are dependent on the likelihood of confusion of the template with other templates.
Abstract:
A method for compressing a bi-level document image containing text is disclosed. The document image is segmented into symbol images each representing a letter, numeral, etc. in the document. The symbol images are classified into a plurality of classes, each class being associated with a template image and a class index. Classification is done by comparing each symbol to be classified with template of existing classes, using a number of image features including zoning profiles, side profiles, topology statistics, and low-order image moments. These image features are compared using a tolerance based method to determine whether the symbol matches the template. After classification, certain classes that have few symbols classified into them may be merged with other classes. In addition, the template images of the classes are down-sampled, where the final sizes of the template images are dependent on the likelihood of confusion of the template with other templates.
Abstract:
A method for authenticating a printed document which carries barcode that encode authentication data, including word bounding boxes for each word in the original document image and data for reconstructing the original image. The printed document is scanned to generate a target document image, which is then segmented into text words. The word bounding boxes of the original and target document images are used to align the target document image. Then, each word in the original document image is compared to corresponding words in the target document image using word difference map and Hausdorff distance between them. Symbols of the original document image are further compared to corresponding symbols in the target document image using feature comparison, symbol difference map and Hausdorff distance comparison, and point matching. These various comparison results can identify alterations in the target document with respect to the original document, which can be visualized.
Abstract:
An improved document authentication method in which critical content, such as signatures, is preserved at a high-resolution in the authentication data carried on the self-authenticating document. When generating authentication data, signatures are compressed without down-sampling to preserve their resolution and quality. The compressed signature data (a bit string) is embedded in an image segment on the document. For example, each bit of the bit string is stored in the low bits of one or more image pixels. A hash code is calculated from the bit string and stored in a barcode printed on the document. To authenticate a scanned-back document, the bit string is recovered from the image segment. A hash code is calculated from the recovered bit string and compared to the hash code extracted from the barcode. The signatures re-generated from the recovered bit string are compared to the signatures in the scanned document.
Abstract:
A document authenticating method is disclosed by which a plurality of two-dimensional barcode stamps are generated and printed on a back side of the document forming a color mosaic pattern. Each barcode stamp by itself is a binary barcode, but the plurality of barcode stamps as a whole are printed with different colors and/or color intensities. The barcode stamps collectively encode the content of the document to be used for document authentication.
Abstract:
A document alteration detection method compares a target image with an original image using a two-step process. In the first step, the original and target images are divided into connected image components and their centroids are obtained, and the centroids of the image components in the original and target images are compared. Each centroid in the target image that is not in the original image is deemed to represent an addition, and each centroid in the original image that is not in the target image is deemed to represent a deletion. In the second step, sub-images containing the image components corresponding to each pair of matching centroids in the original and target images are compared to detect any alterations.
Abstract:
The present application relates to an image processing apparatus and a method for matching and combining two documents with at least some overlap area. Layout features are extracted from the two documents and used to determine common layout areas of the first and second documents, where the common layout area has the same layout in the first and the second documents. Text data in the common layout areas of the first and second documents are also detected and used to determine common text data of the first and second documents, where the common text data is the same in the first and the second documents. Feature points are extracted from the common layout areas of the first and second documents based on the common text data and the first and second documents may be combined based on the feature points.
Abstract:
A method and apparatus for matching two images having areas of overlapping text, first image data, and second image data is provided. The method includes dividing the first image data into a plurality of scene segments and dividing the second image data into a plurality of scene segments, finding a text segment among the scene segments of the first image data and second image data, and detecting common text data in the text segments of the first image data and the second image data, the common text data having identical text data in the text segments of the first image data and the second image data. The method further includes extracting feature points from the first image data and the second image data based on the common text data, and combining the first image data and the second image data according to the feature points.
Abstract:
A method for compensating for color variations introduced by printer hardware limitations and other factors is described. First, the extent of color variation throughout a printed page is determined. Based on this determination, each page is partitioned into a plurality of image areas. A color profile is generated for each image area. The partition and the multiple color profiles are stored in the printer. In an actual printing process, the page of image to be printed is divided into a plurality of image areas based on the paper size and the stored partition, and the respective stored color profiles for the image areas are retrieved and used to process the digital image for printing.