摘要:
Character code data and vector drawing data are both listed and provided in a re-editable manner. Electronic data is generated in which information obtained by vectorizing character areas in an image and information obtained by recognizing characters in the image are stored in respective storage locations. As for the electronic data generated in this manner, because character code data and vector drawing data generated from the input image are both presented by a display and edit program, a user can immediately utilize the both data.
摘要:
In an electronic document of drawing descriptions of a page image and a character, it is desired that although a font data necessary for drawing the character is held in the electronic document, the size of the electronic document is minimized. Furthermore, it is desired to ensure visibility at the time of highlighting of search. There is generated an electronic document in which a document image, a plurality of character codes obtained by executing a character recognition processing with respect to the document image, and a plurality of kinds of glyph data to be utilized in common with respect to the plurality of character codes when drawing characters corresponding to the plurality of character codes are stored. The plurality of kinds of glyph data are selectively used when characters corresponding to the character codes are drawn. It is desirable that the glyph data be the one in a simple form.
摘要:
The invention significantly improves operability by automatically discriminating a plurality of image orientations, which are not assured of always being fed in common orientations, and reduces possible burdens to operators by eliminating efforts required to arrange the images in a common orientation before feeding or to correct each orientations into a common orientation after feeding. The invention improves the operability also by enabling modes in which orientation discrimination as well as tilt corrections can be performed before operator's instructions, if the Auto mode has been specified for the orientation recognition function. The invention also improves accuracy of processing by determining whether orientations or tilt recognition is proper and providing the result to the operators.
摘要:
The present invention relates to an image processing method, an image processing apparatus and an image processing program for dealing with inverted characters (outlined characters) constituted by white pixels on a black ground in a tree structure same as that of normal characters constituted by black pixels on a white ground.In the present invention, black pixel blocks and white pixel blocks are sampled recursively from a binary image, tree structure data indicating a positional relation between the sampled black pixel blocks and white pixel blocks is created, an inverted image is created by white-black-inverting the insides of black pixel blocks that can include inverted characters, of black pixel blocks included in the tree structure data, white pixel blocks and black pixel blacks are sampled from the created inverted image, and data regarding the sampled white pixel blocks and black pixel blocs is added to corresponding nodes of the tree structure data.
摘要:
A document processing apparatus for segmenting a color document image into regions obtains a binary image by binarizing a color image, and extracts regions having different background colors from the color image to generate region information indicating the position and size of each extracted region. By making region segmentation on the basis of the binary image and region information, a region segmentation result that reflects the background colors can be obtained. In this way, region segmentation which can maintain region differences expressed by colors in a color document can be implemented.
摘要:
An image processing apparatus includes a character recognition unit configured to perform character recognition on a plurality of character images in a document image to acquire a character code corresponding to each character image, and a generation unit configured to generate an electronic document, wherein the electronic document includes the document image, a plurality of character codes acquired by the character recognition unit, a plurality of glyphs, and data which indicates the glyphs to be used to render each of the character codes, wherein each of the plurality of glyphs is shared and used by different character codes based on the data when rendering characters that correspond to the plurality of character codes acquired by the recognition unit.
摘要:
Image processing by which both of high compressibility and high image quality are achieved, and in which characters in character regions and graphics in graphic regions are vectorized. If a pixel of a character in a character region overlaps with a graphic in a graphic region, graphic region vectorization is performed first, whereas if a pixel of a character in the character region does not overlap with a graphic in the graphic region, character region vectorization is performed first.
摘要:
The present invention relates to an image processing method, an image processing apparatus and an image processing program for dealing with inverted characters (outlined characters) constituted by white pixels on a black ground in a tree structure same as that of normal characters constituted by black pixels on a white ground.In the present invention, black pixel blocks and white pixel blocks are sampled recursively from a binary image, tree structure data indicating a positional relation between the sampled black pixel blocks and white pixel blocks is created, an inverted image is created by white-black-inverting the insides of black pixel blocks that can include inverted characters, of black pixel blocks included in the tree structure data, white pixel blocks and black pixel blacks are sampled from the created inverted image, and data regarding the sampled white pixel blocks and black pixel blocs is added to corresponding nodes of the tree structure data.
摘要:
An image processing apparatus comprises: a character information acquisition unit configured to acquire character information included in each of a body region and a caption region; an accumulation unit configured to divide the character information acquired from the body region into predetermined set units and to accumulate the character information and position information of the divided set unit in a memory; an anchor term extraction unit configured to extract an anchor term from the character information acquired from the caption region; an anchor term search unit configured to search, based on the character information accumulated in the memory for each set unit, for the set unit including the anchor term extracted; a link information generation unit configured to generate link generation information that associates the set unit found by the anchor term search unit with the object region to which the caption region including the anchor term is appended.
摘要:
A region division portion extracts an “object”, an “anchor expression accompanying the object” and a “text including the anchor expression” from image data based on a paper document and an electronic document. A link processing portion generates link information that associates, in two ways, the “object”, the “anchor expression included in the text” or the “text including the anchor expression” with each other. Then, a format conversion portion converts the link information into electronic document data including two-way link information. When this electronic document data is displayed by an application and one of the “object” and the “anchor expression included in the text” is selected, the other can be displayed according to the link information.