摘要:
A character area extracting device includes a reflective and non-reflective area separation unit separating image data into reflective and non-reflective areas, and binarizing the image data by changing a first threshold value when it is inappropriate; a reflective area binarizing unit separating the reflective area into character and background areas, and binarizing it by changing a second threshold value when it is inappropriate; a non-reflective area binarizing unit separating the non-reflective area into the character and background areas, and binarizing it by changing a third threshold value when it is inappropriate; a reflective and non-reflective area separation evaluation unit; and a line extracting unit connecting the character areas of the reflective and non-reflective areas and extracting positional information of the connected character areas in the image data.
摘要:
A character area extracting device includes a reflective and non-reflective area separation unit separating image data into reflective and non-reflective areas, and binarizing the image data by changing a first threshold value when it is inappropriate; a reflective area binarizing unit separating the reflective area into character and background areas, and binarizing it by changing a second threshold value when it is inappropriate; a non-reflective area binarizing unit separating the non-reflective area into the character and background areas, and binarizing it by changing a third threshold value when it is inappropriate; a reflective and non-reflective area separation evaluation unit; and a line extracting unit connecting the character areas of the reflective and non-reflective areas and extracting positional information of the connected character areas in the image data.
摘要:
A document recognizing apparatus includes a display control unit which displays a document data including a character string related to a character string selected by a user, and an area that includes at least a character string of the document data.
摘要:
A form processing apparatus extracts layout information and character information from a form document. A candidate extracting unit extracts word candidates from the character information. A frequency digitizing unit calculates emission probability of a word candidate from each element. A relation digitizing unit calculates transition probability that relationship between word candidates is established. An evaluating unit calculates an evaluation value indicative of a probability of appearance of word candidates in respective logical elements. A determining unit determines the element and a word candidate thereof as the element and a character string thereof in the form document, based on the evaluation value.
摘要:
A method for assisting in the creation of a logical structure model, which stores, from an image in which character strings associated respectively with a plurality of logical elements constituting a logical structure are described, the logical elements, character strings associated with the logical elements, and the logical structure, wherein character strings in an input image and the logical structure among the character strings in the input image are extracted, a logical element is selected among the plurality of logical elements according to the degrees of similarity between the extracted character strings and the character string associated respectively with the plurality of logical elements stored in the logical structure model, a character string associated with the selected logical element and a character string in the input image associated with the logical element based on the logical structure among the extracted character strings in the input image are extracted.
摘要:
An image recognition method is conducted by recognizing logical elements based on a logical structure model set to correspond to the logical structure of an image of individual character strings, collecting information processed with the logical structure model of images of a logical structure, acquiring a recognition result when recognizing an image of a logical structure by processing information collected with a post-update logical structure model,and outputting warning information about the post-update logical structure model to an output unit when a result of the comparison is a non-match.
摘要:
According to an aspect of an embodiment, a method of detecting boundary line information contained in image information comprising a plurality of pixels in either one of first and second states, comprising: detecting a first group of pixels in the first state disposed continuously in said image information to determine first line information and detecting a second group of pixels in the first state disposed adjacently with each other and surrounded by pixels in the second state to determine edge information based on the contour of the second group of pixels; and determining the boundary line information on the basis of the information of the relation of relative position of the line information and the edge information and the size of the first and second group of pixels.
摘要:
A document type identifying apparatus includes in advance a database storing therein keywords used as keys that identify document types in association with each document type. The document type identifying apparatus aligns word strings written on a document and generates partial keyword strings for each keyword by using the keywords stored in the database. The partial keyword strings are to be checked for matching with the word strings written on the document. Then, the document type identifying apparatus checks matching of the grouped and aligned word strings with the partial keyword strings and obtains, for each keyword, each number of matched words with the highest matching rates between the grouped word strings that are successfully matched and the partial keyword strings. Then, each number of matched words is used to calculate each evaluation value to determine the document type.
摘要:
An image recognition method is conducted by recognizing logical elements based on a logical structure model set to correspond to the logical structure of an image of individual character strings, collecting information processed with the logical structure model of images of a logical structure, acquiring a recognition result when recognizing an image of a logical structure by processing information collected with a post-update logical structure model,and outputting warning information about the post-update logical structure model to an output unit when a result of the comparison is a non-match.
摘要:
According to an aspect of an embodiment, a method of detecting boundary line information contained in image information comprising a plurality of pixels in either one of first and second states, comprising: detecting a first group of pixels in the first state disposed continuously in said image information to determine first line information and detecting a second group of pixels in the first state disposed adjacently with each other and surrounded by pixels in the second state to determine edge information based on the contour of the second group of pixels; and determining the boundary line information on the basis of the information of the relation of relative position of the line information and the edge information and the size of the first and second group of pixels.