摘要:
A character reading method has enhanced character segmentation accuracy and character string recognition accuracy for reading correctly hand-written addresses on postal matters. The method extracts provisional character patterns from image information of the address character string (step 206), creates a table 219 of tentative character patterns and implements the character classification for the tentative character patterns (step 207), extracts, specifically for characters of the street number portion of the address character string, periphery information (vertical and horizontal lengths, vertical/horizontal length ratio, pattern spacings, etc.) of tentative character patterns (step 212), and segments the character string into characters accurately based on the information (step 215).
摘要:
An address reader method and apparatus for recognizing a receiver address on a surface of mail. In the invention an image of the surface of the mail is input and segmented into at least one character string candidate. At least one address area candidate is extracted from the image based upon the segmented character string candidate. One of the address area candidates extracted from the image is selected as a receiver address of the mail by analyzing each of the address area candidates based on predetermined position information indicating a usual position of a receiver address area, character direction information indicating a character direction of a character string appropriate for the predetermined position information, and key character string information indicating a character string most likely to exist in a receiver address. Characters in character strings of the selected address area candidate are recognized as a receiver address which is used to sort the mail.
摘要:
An address reader method and apparatus for recognizing a receiver address on a surface of mail. An image of the surface of the mail is input and segmented into at least one character string candidate. At least one address area candidate is extracted from the image based upon the segmented character string candidate. One address area candidate extracted from the image is selected as a receiver address by analyzing each address area candidate based on predetermined position information indicating a usual position of a receiver address area, character direction information indicating a character direction of a character string appropriate for the predetermined position information, and key character string information indicating a character string most likely to exist in a receiver address. Characters in character strings of the selected address area candidate are recognized as a receiver address.
摘要:
A file document image input can have shading removed to produce a deshaded image that is useful for highly efficient compression encoding, to be thereafter stored and transmitted in such efficient encoded form. When the image is retrieved or received, the decoded and deshaded image may have shading returned to it by combining with a shaded template image stored in the template image memory or by synthesizing shading in specified regions. By removing shading from regions, optical character recognition, shading identification or other processing such as smear prevention can be enhanced.
摘要:
A method and a machine for generating a dictionary of address phrase expressions. One embodiment of the invention includes an apparatus for generating a dictionary of target phrases that includes an input interface for receiving as its input a first address phrase included in a list of address phrase expressions, a memory for storing a dictionary of address phrase variants, including rules for generating variants of address phrase expressions, and a processing device for generating variants of address phrases which generates a second address phrase which is different in expression from the first address phrase to output the second address phrase to a storage device holding the dictionary of target phrases, based on the input first address phrase, and the knowledge of rules about variants included in the dictionary of address phrase variants.
摘要:
An information retrieval system with good human-interface methods to give the system ease-of-use having two distinctive features with the first being visual interface and the second being natural language interpretation. The visual interface provides for visual interaction for local search and natural language interpretation provides for linguistic interaction for global search. The visual interface provides versatile views onto the contents of the knowledge base that the system has, controlling mechanisms for browsing through the knowledge base, a capability of showing relevant information for the users, and a mechanism for editing a query expression that describes information to retrieve. By using the visual interface for information retrieval, the users can easily create query expressions, by consulting and reacting with the system. The natural language interpretation makes use of a conceptual network as a knowledge-base that stores important concepts and relationships among these concepts. Based on knowledge and information represented in the conceptual network, the meaning of a noun phrase or a nominal compound which is a string of adjectives and nouns with some prepositions can be inferred. The inferred interpretation of such a noun phrase is paraphrased into an expression that the information retrieval system can handle. Therefore, the user of the system can simply describe the desired information in a language to get the desired information.
摘要:
A document analysis system for determining format information of a document, wherein frames and a relationship of the frames are extracted from an image of an unmarked sample document, characters in a frame of the document are recognized, and an image structure is analyzed based on the frame and the recognized characters.
摘要:
A character stream search system using an FSA for determining at a time whether or not a plurality of character streams as search objects exist in a search character stream which undergoes a search operation and which comprises a plurality of characters expressed with codes. In the system, a collation is conducted between the search character stream and a search object character. In a case where there exists a matched search object character as a result of the collation, a state transition is carried out of a predetermined state indicated by the FSA. In a case where there does not exist a matched search object character, a failure processing to effect a state transition to a transistion destination which is determined in association with the configuration of the FSA. The following processing is completed at a count which is a predetermined upper-limit value for each character undergone the search operation.
摘要:
A picture coding system for a document image data in which a plurality of picture coding methods are subjected to a selection such that document image data is converted so as to obtain an identification code representing the selected coding method and signals obtained by coding the data. The system includes units for detecting and for accumulating code lengths of the codes converted in the respective coding methods, units for detecting and for accumulating a difference between the lengths of the respective codes obtained by the conversion achieved in the coding methods, a storage for storing therein a history of a result of the selection of the preceding coding method, and a change-over judge unit for selecting one of the plural coding methods based on information obtained from the three kinds of units.
摘要:
A digitized multilevel signal is encoded on the basis of unit combinations each including a run-length bit field indicative of run-length at each of the signal levels and a single continuation bit indicative of transition of the signal levels. The encoding is so modified that a virtual run of zero length is inserted when the transition of the signal levels at adjacent sampling points departs from a predetermined order. Polarity of the continuation bit is inverted upon every transition of the signal level.