摘要:
A system and method facilitating document image compression utilizing a mask separating a foreground of a document image from a background is provided. The invention includes a pixel energy analyzer adapted to partition regions into a foreground and background. The invention further provides for a merge region component adapted to attempt to merge regions if the merged region would not exceed a threshold energy. Merged regions are partitioned into a new foreground and new background. Thereafter, a mask storage component stores the partitioning information in a binary mask.
摘要:
Systems and methods for performing clustering of a document image are disclosed. A property of an extracted mark from a document is compared to the properties of the existing clusters. If the property of the mark fails to match any of the properties of the existing clusters, the mark is added as a new cluster to the existing cluster. One property that can be utilized is x size and y size, which is the width and height, of the existing clusters. Another property that can be employed is ink size, which refers to the ratio of black pixels to total pixels in a cluster. Yet another property that can be utilized is a reduced mark or image, which is a pixel size reduced version the bitmap of the mark and/or cluster. The above properties can be employed to identify mismatches and reduce the number of bit by bit comparisons performed.
摘要:
A system and method facilitating compression of bi-level images with explicit representation of ink clusters is provided. The present invention includes a cluster shape estimator that analyzes connected component information, extracts clusters and stores the cluster in a global dictionary, a page dictionary or a store of unclustered shapes. A bitmap estimation from clusters component determines dictionary positions for clusters stored in the global dictionary which are then encoded. A cluster position estimator determines page positions of clusters of the global dictionary and/or the page dictionary that are then encoded. Further, the global dictionary, the page dictionary and the store of unclustered shapes are also encoded.
摘要:
Systems and methods for performing clustering of a document image are disclosed. A property of an extracted mark from a document is compared to the properties of the existing clusters. If the property of the mark fails to match any of the properties of the existing clusters, the mark is added as a new cluster to the existing cluster. One property that can be utilized is x size and y size, which is the width and height, of the existing clusters. Another property that can be employed is ink size, which refers to the ratio of black pixels to total pixels in a cluster. Yet another property that can be utilized is a reduced mark or image, which is a pixel size reduced version the bitmap of the mark and/or cluster. The above properties can be employed to identify mismatches and reduce the number of bit by bit comparisons performed.