摘要:
An image processing apparatus comprises: a character information acquisition unit configured to acquire character information included in each of a body region and a caption region; an accumulation unit configured to divide the character information acquired from the body region into predetermined set units and to accumulate the character information and position information of the divided set unit in a memory; an anchor term extraction unit configured to extract an anchor term from the character information acquired from the caption region; an anchor term search unit configured to search, based on the character information accumulated in the memory for each set unit, for the set unit including the anchor term extracted; a link information generation unit configured to generate link generation information that associates the set unit found by the anchor term search unit with the object region to which the caption region including the anchor term is appended.
摘要:
A region division portion extracts an “object”, an “anchor expression accompanying the object” and a “text including the anchor expression” from image data based on a paper document and an electronic document. A link processing portion generates link information that associates, in two ways, the “object”, the “anchor expression included in the text” or the “text including the anchor expression” with each other. Then, a format conversion portion converts the link information into electronic document data including two-way link information. When this electronic document data is displayed by an application and one of the “object” and the “anchor expression included in the text” is selected, the other can be displayed according to the link information.
摘要:
Even when captions of a plurality of objects use an identical anchor expression, the present invention can associate an appropriately explanatory text in a body text as metadata with the objects.
摘要:
An image processing apparatus successively designates each page of an input page image as a processing target, detects an anchor expression constituted by a specific character string, and associates a highlight position corresponding to the anchor expression with a link identifier. When the anchor expression and the link identifier are registered in a link configuration management table, if the same anchor expression is already registered in the table, the apparatus updates the table in such a way as to mutually associate the link identifiers of the same anchor expression. The apparatus generates page data of an electronic document based on a link identifier relating to a processing target page image and its highlight position and transmits the generated page data. The apparatus generates information usable to link the relevant link identifiers based on the link configuration management table, after completing the processing for all pages, and transmits the generated information.
摘要:
A region division portion extracts an “object”, an “anchor expression accompanying the object” and a “text including the anchor expression” from image data based on a paper document and an electronic document. A link processing portion generates link information that associates, in two ways, the “object”, the “anchor expression included in the text” or the “text including the anchor expression” with each other. Then, a format conversion portion converts the link information into electronic document data including two-way link information. When this electronic document data is displayed by an application and one of the “object” and the “anchor expression included in the text” is selected, the other can be displayed according to the link information.
摘要:
An apparatus comprises: unit configured to divide input document data into a body region, a caption region, and an object region; unit configured to acquire text information included in each of the body region and the caption region; unit configured to search the text information in the body region for an anchor term, to extract an anchor term from the text information in the caption region, and to generate a bi-directional link between a portion corresponding to the anchor term in the body region and a portion of the object region to which the caption region is appended; and unit configured to convert the input document data into digital document data in which the portion corresponding to the anchor term in the body region and the portion corresponding to the object region to which the caption region is appended are bi-directionally linked based on the link.
摘要:
An image processing apparatus comprises: a character information acquisition unit configured to acquire character information included in each of a body region and a caption region; an accumulation unit configured to divide the character information acquired from the body region into predetermined set units and to accumulate the character information and position information of the divided set unit in a memory; an anchor term extraction unit configured to extract an anchor term from the character information acquired from the caption region; an anchor term search unit configured to search, based on the character information accumulated in the memory for each set unit, for the set unit including the anchor term extracted; a link information generation unit configured to generate link generation information that associates the set unit found by the anchor term search unit with the object region to which the caption region including the anchor term is appended.
摘要:
According to the present invention, it is possible to create electronic document data capable of highlighting an object detected through a search so that a user can easily recognize it. An image processing apparatus extracts an object from an input image and extracts metadata related to the object. The image processing apparatus, when determines to describe with a shape in accordance with the shape of the object, creates a vector path description of frame described with a shape in accordance with the shape of the object. Then, the image processing apparatus creates an electronic document including data of the input image and the vector path description of frame with which the metadata is associated. When a keyword search is performed on the created electronic document, highlight display is performed in accordance with the vector path description of frame with which metadata that matches the keyword is associated.
摘要:
An image processing apparatus extracts an object area (e.g., character, picture, line drawing, and table) from an input image and acquires a metadata to be associated with the object. The image processing apparatus generates a transparent graphics description for an object area having an attribute that requires generation of the transparent graphics description, and generates an electronic document while associating the transparent graphics description with the metadata. As transparent graphics description, an arbitrary shape of graphics can be used. Accordingly, the image processing apparatus can generate electronic document data suitable for a highlight expression, which is easy for users to recognize in a search operation using a keyword to search an object included in an electronic document.
摘要:
According to the present invention, it is possible to create electronic document data capable of highlighting an object detected through a search so that a user can easily recognize it. An image processing apparatus extracts an object from an input image and extracts metadata related to the object. The image processing apparatus, when determines to describe with a shape in accordance with the shape of the object, creates a vector path description of frame described with a shape in accordance with the shape of the object. Then, the image processing apparatus creates an electronic document including data of the input image and the vector path description of frame with which the metadata is associated. When a keyword search is performed on the created electronic document, highlight display is performed in accordance with the vector path description of frame with which metadata that matches the keyword is associated.