Abstract:
Examples of systems and methods for enhancing image quality for documents with highlighted portions are described. A highlighted portion from a received input is detected. The highlighted portion is automatically segmented as a text layer and the remaining portion is further segmented as a separate text layer and an image layer. A resolution is assigned to the highlighted portion, wherein the resolution assigned to the highlighted portion is greater than the respective resolution for the text layer and the image layer, to improve quality of the highlighted portion. The text layer corresponding to the highlighted portion and the separate text layer and the image layer are integrated together to generate a scanned document in a Mixed Raster Content (MRC) file format.
Abstract:
A method and system are provided for optical character recognition (OCR) of multi-language content. The method includes extracting a text portion from an image received from a user-computing device. The text portion comprises a plurality of keywords associated with a plurality of languages. The method further includes segmenting the plurality of keywords into a plurality of layers. Each layer of the plurality of layers comprises one or more keywords which are associated with a language. The method further comprise generating an OCR output of each of the plurality of layers based on the language associated with the one or more keywords in each of the plurality of layers. The method further comprises generating an electronic document of the received image based on the generated OCR output of each of the plurality of layers. The method further includes transmitting the generated electronic document to the user-computing device.
Abstract:
The present disclosure discloses methods and systems for creating a multi-layered Optical Character Recognition (OCR) document, the multi-layered OCR document facilitates selection of the desired text from the multi-layered OCR document. The method includes receiving a scanned image corresponding to a document, the document includes text information. A binary image is generated from the scanned image. Then, a morphological dilation operation is performed to create one or more text groups, using a horizontal structuring element and a vertical structuring element. Thereafter, OCR operation is applied on each text group to generate a corresponding OCR layer. The one or more OCR layers are then combined while creating a multi-layered OCR document. Finally, the combined OCR layers are superimposed as invisible text layers over the scanned image to create the multi-layered OCR document.
Abstract:
A system and method for generating a mixed raster content representation of an input image. An input image is segmented into image and text layers. Connected component analysis is performed on the image layer, with each group of connected pixels labeled. For each group, an average color is determined and when a mask layer exists for the color, the pixels in the group are enabled in the mask layer. When a mask layer does not exist for the color, a new mask layer is created and the corresponding pixels enabled. The image layer is then removed and the mask layers are combined into the text layer, whereupon a text only MRC compression file is output.
Abstract:
According to embodiments illustrated herein there is provided a method of image compression. The method includes generating a modified image based on a compression of an image. The method further includes generating a first residual layer and a second residual layer based on a comparison of the modified image and the image. The method further includes filtering a set of pixels from the first residual layer and the second residual layer. The method further includes compressing the filtered first residual layer and the filtered second residual layer to generate a compressed first residual layer and a compressed second residual layer. Additionally, the method includes generating a second compressed image based on the modified image, the compressed first residual image and the compressed second residual image.
Abstract:
The disclosed embodiments illustrate methods and systems for estimating a half-tone frequency of an image. The method includes combining, by one or more processors, a first binary block, obtained from a portion of the image, with one or more second binary blocks to create a third binary block. Each of the one or more second binary blocks is obtained by shifting the first binary block. The method further includes estimating, by the one or more processors, the half-tone frequency of the portion of the image, based on the first binary block and the third binary block.
Abstract:
Various embodiments for methods and systems for processing a document are disclosed. A font size associated with each of one or more text regions included in the document is determined. A first resolution of each of the one or more text regions is modified based on respective font size associated with each of the one or more text regions to generate a multi-resolution document.
Abstract:
The disclosed embodiments illustrate methods and systems for image processing. The method includes dividing a portion of an image into a set of blocks, each block of which is divided into a set of sub-blocks. Thereafter, a measurable block is identified from the set of blocks based on a measurability criteria that comprises determining an average pixel value for each of the sub-blocks based on one or more pixels encompassed by respective sub-block. Further, a maximum average pixel value, a minimum average pixel value, and a range of average pixel values are determined among the set of sub-blocks. The measurability criteria further includes comparing the maximum average pixel value, the minimum average pixel value, and the range of average pixel values with respective pre-determined thresholds. The method further includes estimating a half-tone frequency of the portion based on a processing of the identified measurable block.