摘要:
Method and apparatus for processing an image including a character are disclosed. The method may include: searching in a set of characters one or more characters having highest similarities of shape to a character in the set of characters, hereinafter the character being referred to as a first character, the one or more searched characters forming a similar character list of the first character; searching in the set of characters one or more characters having highest similarities of shape to each character in the similar character list of the first character, to form a similar character list of each character in the similar character list of the first character; and selecting in the similar character lists one or more characters having a high mutual similarity between each other, as a character cluster.
摘要:
A method for processing a document image includes: performing horizontal and vertical text line extraction on the document image; providing an overlapping matrix, a value of an element of the overlapping matrix indicating an overlapping relation between horizontal and vertical text lines; merging the overlapping matrix in the vertical and horizontal direction; determining one or more text overlapping regions in the document image, based on the values of the elements of the merged overlapping matrix; counting the total number of strokes or pixel points in the horizontal and vertical text lines, respectively, within one of the one or more text overlapping regions; and determining an orientation of the text overlapping region is horizontal if the total number of strokes or pixel points in the horizontal text lines is larger than that in the vertical text lines, otherwise, determining the orientation is vertical.
摘要:
A method and apparatus for generating a degraded character image at various levels of degradation automatically is presented in this invention. The method comprises rendering the character image on a scene plane; translating and rotating the scene plane according to various parameters; determining a projection region of the character image on an image plane according to various parameters; generating a pixel region mask; and generating a final degraded image by super-sampling. Thus various degraded character images are generated on various conditions of degradation. The generated synthetic characters can be used for performance evaluation and training data augmentation in optical character recognition (OCR).
摘要:
A table image processing device processes a table image and a memory medium stores a processing program. The table image processing device processes precisely a table image containing a round corner and includes a device extracting a line extracting a longitudinal line and lateral line out of an input image, a device finding a potential match of a round corner region extracting an oblique line which commences from a terminal of a line found by the line extracting device, and finding the potential match of the round corner region, a device extracting a cell containing the potential match of the round corner found by the potential match of the round corner region finding device, and a device deciding the round corner part deciding the round corner from the cells found by the device extracting the cells.
摘要:
Precise grayscale character segmentation apparatus and method. The precise grayscale character segmentation apparatus comprises an adjustment and segmentation unit for adjusting and segmenting an inputted low resolution text line image undergone coarse segmentation, so as to generate an adjusted character image; a character image binarization unit for generating a binary character image from the character image inputted therein; a noise removal unit for removing noise information in the binary character image generated by the binarization unit; and a final character image segmentation unit for generating a precisely segmented character image from the binary character image from which noise has been removed.
摘要:
A grayscale character dictionary generation apparatus, comprising a first synthetic grayscale degraded character image generation unit for generating first synthetic grayscale degraded character images using binary character images inputted therein; a clustering unit for dividing each category of the first synthetic grayscale degraded character images generated by the first synthetic grayscale degraded character image generation unit into a plurality of clusters; a template generation unit for generating template for each of the clusters; a transformation matrix generation unit for generating transformation matrix in relation to each of the templates; and a second synthetic grayscale degraded character dictionary generation unit for obtaining character feature of every grayscale degraded character of each of the clusters using the transformation matrix, and for constructing eigenspace of each category of the synthetic grayscale degraded character, which is the second synthetic grayscale character dictionary.
摘要:
A method and apparatus for generating a degraded dictionary automatically is presented in this invention. Herein, a degraded pattern generating means generates a plurality of degraded patterns from an original character image, based on a plurality of degradation parameters. A degraded dictionary generating means generates a plurality of degraded dictionaries corresponding to the plurality of degradation parameters, based on the plurality of degradation patterns. Finally, a dictionary matching means selects one of the plurality of dictionaries which matches the degradation level of a test sample set best, as the final degraded dictionary. In this invention, various degraded patterns can be generated by means of simple scaling and blurring process for establishing degraded dictionaries. Therefore, the invention can be implemented simply and easily. The method and apparatus of the invention can not only be used in character recognition field, but also can be used in other fields such as speech recognition and face recognition.
摘要:
An image processing apparatus for extracting the specified objects has a background image extract unit for extracting a background; a first average background extract unit which extracts an image that includes a plurality of stationary and moving objects each having a speed not higher than a predetermined first speed and also the background; a second average background extract unit which extracts an image that includes the stationary and moving objects each having a speed not higher than a predetermined second speed and also the background; a first difference-calculation processing unit which calculates a difference between an output from the background image extract unit and an output from the first average background extract unit as a first speed image; a second difference-calculation processing unit which calculates a difference value between two outputs from the first and second average background extract units as a second speed image; and a third difference-calculation processing unit which calculates a difference value between an original image and either one of outputs from the first and second average background extract units as a third speed image.
摘要:
A title extracting apparatus scans black pixels in a document image and extracts rectangular regions that circumscribe connected regions of the black pixels as character rectangles. In addition, the title extracting apparatus unifies a plurality of character rectangles that adjoin and extracts rectangular regions that circumscribe the character rectangles as character string rectangles. Thereafter, the title extracting apparatus calculates points with the likelihood of being a title corresponding to attributes such as an underline attribute, a frame attribute, and a ruled line attribute of each character string rectangle, the positions of the character string rectangles in the document image, and the mutual position relation and extracts a character string rectangle with the highest points as a title rectangle. In the case of a tabulated document, the title extracting apparatus can extract a title rectangle from the inside of the table. Characters extracted from the title rectangle are used as keywords of a document image by the character recognizing process.
摘要:
A method and apparatus for assigning a temporary label to each connected area in an image by scanning the image by using a window which has a size of two pixels in the vertical direction and of a plurality of pixels in the horizontal direction. A set of values of pixels contained in the above window is obtained and one of predetermined temporary label assignment rules corresponding to the obtained set of pixel values is selected. A temporary label is assigned to each pixel contained in the window, based on the above one of the temporary label assignment rules determined as above, and based on temporary labels of pixels in the second group in the window at the above each location. In addition, the temporary labels are converted to true labels, by scanning the image pixel within the at least one circumscribing area only, where each circumscribing area is predetermined so that the at least one circumscribing area contains all pixels which do not belong to a background area in the image.