摘要:
Fixed-pitch, fixed-font characters embedded in a noisy gray-scale image of picture elements (pels) within a complex background can be extracted prior to execution of any recognition operations by first deriving a normalized Boolean-coded image from the gray-scale image. Then, a subset of at least three uncontaminated character triples is formed by filtering the Boolean-coded image. Next, an affine transform is approximated from locations in the Boolean-coded image of at least three noncollinear ones of the uncontaminated character triples. Lastly, the locations in a logical matrix array of all possible character triples are estimated according to the affine transform.
摘要:
Digital data is preserved by archiving on a removable medium. In the long term, the save data bit stream must be correctly interpreted. For a computer program or system to be archived, the bit stream constituting the program must be archived and the code must be executable at restore time. The program that restores the data does not “see” the contents of the data itself, but accesses it by issuing a function call to an executor. A description of which methods are available to restore the information hidden in the data is always available. A text tells the client which functions are available and what their purposes are. The archiving method is based on using a virtual computer instruction set and saving the algorithm as a program written int hat virtual machine language. For machine instructions to be executed many years later, for example 100 years, an emulator of the original machine would be written on the future hardware. Any machine manufactured in the originating year would develop for each architecture a Universal Virtual Computer (UVC) description of the machine. Each originating instruction would be mapped into a small program of UVC instructions. All manufacturers of new architectures would then have to write a UVC executor which would be able to execute UVC instructions on the machine running 100 years in the future.
摘要:
An optical character recognition (OCR) system is provided, in which syntactical and semantic rules, provided along with an input image to be scanned and applicable to the contents of the scanned image, are used in connection with the results of the OCR scan to identify the scanned characters. As a result, the recognition rate and confidence are enhanced. By providing the checking based on syntactical and semantic rules within the OCR system, application programs which would receive and use the OCR results are freed from the added burden of having to perform their own syntactical and/or semantic checking on the OCR results the application programs receive from the OCR system.
摘要:
An optical character recognition method and system are provided, employing context analysis and operator input, alternatively and in combination, on the same batch of documents. After automatic character recognition, the context analyzer processes the fields that are good enough to expect resolution. This will accept as many fields as possible without any operator intervention. For some other fields, the process uses operator input to certify the character-level OCR result of, or to enter, a certain percentage of the characters, so that context analysis may accept some of the remaining fields. If the context analyzer successfully identifies a small set of very close hypotheses, the process asks the operator to certify one or two characters to resolve the ambiguity between the hypotheses. For the fields that are still not resolved, the fields and the hypotheses are shown to the operator for acceptance, correction, or entry.