摘要:
A character area extracting device includes a reflective and non-reflective area separation unit separating image data into reflective and non-reflective areas, and binarizing the image data by changing a first threshold value when it is inappropriate; a reflective area binarizing unit separating the reflective area into character and background areas, and binarizing it by changing a second threshold value when it is inappropriate; a non-reflective area binarizing unit separating the non-reflective area into the character and background areas, and binarizing it by changing a third threshold value when it is inappropriate; a reflective and non-reflective area separation evaluation unit; and a line extracting unit connecting the character areas of the reflective and non-reflective areas and extracting positional information of the connected character areas in the image data.
摘要:
A character area extracting device includes a reflective and non-reflective area separation unit separating image data into reflective and non-reflective areas, and binarizing the image data by changing a first threshold value when it is inappropriate; a reflective area binarizing unit separating the reflective area into character and background areas, and binarizing it by changing a second threshold value when it is inappropriate; a non-reflective area binarizing unit separating the non-reflective area into the character and background areas, and binarizing it by changing a third threshold value when it is inappropriate; a reflective and non-reflective area separation evaluation unit; and a line extracting unit connecting the character areas of the reflective and non-reflective areas and extracting positional information of the connected character areas in the image data.
摘要:
A document recognizing apparatus includes a display control unit which displays a document data including a character string related to a character string selected by a user, and an area that includes at least a character string of the document data.
摘要:
A form processing program which is capable of automatically extracting keywords. When the image of a scanned form is entered, a layout recognizer extracts a readout region of the form image, a character recognizer recognizes characters within the readout region. A form logical definition database stores form logical definitions defining strings as keywords according to logical structures which are common to forms of same type. A possible string extractor extracts as possible strings combinations of recognized characters each of which satisfies defined relationships of a string. A linking unit links the possible strings according to positional relationships, and determines a combination of possible strings as keywords.
摘要:
A ruled line extracting apparatus, a ruled line extracting program and a ruled line extracting method re-extract a ruled line by changing the predetermined requirements to be met by ruled line s when a ruled line candidate extracted according to the requirements shows a low reliability. A ruled line extracting program that causes a computer to extract a ruled line in an image of a document comprises an extraction step that extracts a ruled line candidate from the image of a document according to the first requirement predefined to be met by the figures of the elements of the ruled lines, a judgment step that judges if the ruled line candidate is stable or unstable according to the structural stability of the ruled line candidate extracted in the extraction step, a requirement determination step that determines the second requirement to be met by the figures of the elements of the ruled line different from the first requirement according to the ruled line candidate judged as stable in the judgment step and the first requirement and a re-extraction step that re-extracts a ruled line candidate according to the second requirement determined in the requirement determination step.
摘要:
A form processing apparatus extracts layout information and character information from a form document. A candidate extracting unit extracts word candidates from the character information. A frequency digitizing unit calculates emission probability of a word candidate from each element. A relation digitizing unit calculates transition probability that relationship between word candidates is established. An evaluating unit calculates an evaluation value indicative of a probability of appearance of word candidates in respective logical elements. A determining unit determines the element and a word candidate thereof as the element and a character string thereof in the form document, based on the evaluation value.
摘要:
A program causes a computer to function as a document recognition apparatus, having an extraction unit for extracting connected components of pixels from an input image, a generation unit for generating a reference element that is connected components of pixels extracted by the extraction unit and combined elements obtained by combining the reference element and connected components of pixels adjacent to the reference element as an element to be estimated, a calculation unit for calculating a degree of certainty that indicates how much the element to be estimated generated by the generation unit seems to be a character, and a determination unit for identifying elements that seem to be characters among the elements to be estimated based on the degree of certainty calculated by the calculation unit.
摘要:
An area extraction method including obtaining a character lattice showing a connection relation between unit areas, which are obtained by separating a character string pattern in an image into patterns each recognized as corresponding to a single character, judging whether or not all combinations of each of the unit areas in the obtained character lattice and each of the unit areas in a regular lattice defining a regular connection relation between the unit areas are likely to be established, generating a path coupling between nodes corresponding to the combination of the unit areas which is determined as likely to be established, determining an optimum path from the generated paths based on a degree of coincidence with the regular lattice or the character lattice, and extracting from an image the unit areas in the character lattice corresponding to the determined optimum path.
摘要:
A method for assisting in the creation of a logical structure model, which stores, from an image in which character strings associated respectively with a plurality of logical elements constituting a logical structure are described, the logical elements, character strings associated with the logical elements, and the logical structure, wherein character strings in an input image and the logical structure among the character strings in the input image are extracted, a logical element is selected among the plurality of logical elements according to the degrees of similarity between the extracted character strings and the character string associated respectively with the plurality of logical elements stored in the logical structure model, a character string associated with the selected logical element and a character string in the input image associated with the logical element based on the logical structure among the extracted character strings in the input image are extracted.
摘要:
An image recognition method is conducted by recognizing logical elements based on a logical structure model set to correspond to the logical structure of an image of individual character strings, collecting information processed with the logical structure model of images of a logical structure, acquiring a recognition result when recognizing an image of a logical structure by processing information collected with a post-update logical structure model,and outputting warning information about the post-update logical structure model to an output unit when a result of the comparison is a non-match.