摘要:
A method and apparatus for detection of highlighted regions of a document. A document containing highlighted regions is scanned using a gray scale scanner. Morphology and threshold reduction techniques are used to separate highlighted and non-highlighted portions of the document. Having separated the highlighted and non-highlighted portions, optical character recognition (OCR) techniques can then be used to extract text from the highlighted regions.
摘要:
A method and apparatus for detection of highlighted regions of a document. A document containing highlighted regions is scanned using a gray scale scanner. Morphology and threshold reduction techniques are used to separate highlighted and non-highlighted portions of the document. Having separated the highlighted and non-highlighted portions, optical character recognition (OCR) techniques can then be used to extract text from the highlighted regions.
摘要:
A method and apparatus for detection of highlighted regions of a document. A document containing highlighted regions is scanned using a gray scale scanner. Morphology and threshold reduction techniques are used to separate highlighted and non-highlighted portions of the docment. Having separated the highlighted and non-highlighted portions, optical character recognition (OCR) techniques can then be used to extract text from the highlighted regions.
摘要:
Machine readable electronic domain definitions of part or all of the electronic domain descriptions of hardcopy documents and/or of part or all of the transforms that are performed to produce and reproduce such hardcopies documents are encoded in codes that are printed on such documents, thereby permitting the electronic domain descriptions of such documents and/or such transforms to be recovered more robustly and reliably when the information carried by such documents is transformed from the hardcopy domain to the electronic domain.
摘要:
An image markup detection device and method identifies and extracts markup lines and regions marked automatically or interactively by a user with an ordinary pen or pencil. Only morphological image processing operations on a scanned source image are used, resulting in the extrapolation of markup lines and marked region. The markup lines are either extracted from the image, or the background information of the image (e.g., text) is removed, leaving only the markup lines. The marked region can then be printed, transferred or otherwise processed.
摘要:
A method of apparatus for automatic page orientation of a scanned image which compares the number of character ascending pixels to the number of character descending pixels in the image to determine if the image is properly aligned or is 90.degree. or 180.degree. out of orientation. The method and apparatus includes morphologically processing the bitmap of the scanned image using structuring elements for isolating the character ascenders and descenders. When page orientation is improper, the bitmap image of the scanned image is rotated to correct the misalignment.
摘要:
In a text recognition system, the computational efficiency of a text line image decoding operation is improved by utilizing the characteristic of a graph known as the cut set. The branches of the data structure that represents the image are initially labeled with estimated scores. When estimated scores are used, the decoding operation must perform iteratively on a text line before producing the best path through the data structure. After each iteration, nodes in the best path are re-scored with actual scores. The decoding operation incorporates an operating mode called skip mode. When the number of consecutive image positions for which the change value of cumulative path scores between current and prior iterations is substantially constant and exceeds a threshold, this signals the presence of a cut set, and the score change value is added to a previously computed path score until a re-scored node is encountered, thereby eliminating the expensive computation of new cumulative path scores at those image positions.
摘要:
A method and apparatus for differentiating and extracting handwritten annotations and machine printed text in an image. The method provides for the use of morphological operations, preferably at reduced scale, to eliminate for example, the handwritten annotations from an image. A separation mask is produced that, for example, converts all the image pixels corresponding to machine printed text, and none of the image pixels corresponding to handwritten or handprinted annotations. The separation mask is used in conjunction with the original image to produce separate handwritten annotations and machine printed text images. The invention also provides a method and apparatus for identifying the location of specialized type styles such as bold and italic is disclosed. The method erodes a binary image utilizing structuring elements which provide a relatively large number of hits in regions containing the specialized type styles. The destination image resulting from the erosion is coalesced so as to form masks which may be used to extract portions of the original image containing the specialized type styles.
摘要:
Methods and apparatus of processing an undecoded document image in a digital computer to modify the document image so as to emphasize semantically significant portions without first converting the document image to character codes. The document image is segmented into image units, and morphological image characteristics of the image units are evaluated to identify significant image units for emphasis. In one embodiment, the significant image units are emphasized by modifying at least one shape characteristic of the significant image units using at least one uniform morphological bitmap operation applied to the entire image unit bitmaps corresponding to the significant image units.
摘要:
A system for authenticating a hard copy of an original document. The system employs a special copying machine at the sender's end together with a special ID card (smart card) or other user identification for activating the special machine, and a special copying machine at the receiving end. At the sender's station, the original document and ID card are inserted into the machine. The latter digitizes the document text, to produce a digital signature which incorporates unique information from the sender's ID card. This machine then produces a hard copy of the document to which is added the digital signature. The sender retains the original, but forwards the copy to the recipient or receiver. The receiver then inserts the received copy into the machine at his location, which digitizes and processes the document text and signature and indicates whether the digital signature is valid. Preferably a dual key authentication system is used, with the digital signature incorporating the sender's secret signing key, and the receiver using the related public key in the validation process.