Abstract:
The current document is directed to methods and systems for identifying symbols corresponding to symbol images in a scanned-document image or other text-containing image, with the symbols corresponding to Chinese or Japanese characters, to Korean morpho-syllabic blocks, or to symbols of other languages that use a large number of symbols for writing and printing. In one implementation, the methods and systems to which the current document is directed create and store a decision tree, the nodes of which include classifiers that each recognizes the symbol that corresponds to a symbol image. Input of a symbol image to the decision tree and processing of the symbol image through one or more nodes of the decision tree returns a symbol corresponding to the symbol image.
Abstract:
The current document is directed to methods and systems for identifying symbols corresponding to symbol images in a scanned-document image or other text-containing image, with the symbols corresponding to Chinese or Japanese characters, to Korean morpho-syllabic blocks, or to symbols of other languages that use a large number of symbols for writing and printing. In one implementation, the methods and systems to which the current document is directed create and store a decision tree, the nodes of which include classifiers that each recognizes the symbol that corresponds to a symbol image. Input of a symbol image to the decision tree and processing of the symbol image through one or more nodes of the decision tree returns a symbol corresponding to the symbol image.
Abstract:
A method for detecting a junction in a received image of the line of text to update a junction list with descriptive data is provided. The method includes creating a color histogram based on a number of color pixels in the received image of the line of text and detecting, based at least in part on the received image of the line of text, a rung within the received image of the line of text. The method also includes identifying a horizontal position of the detected rung in the received image of the line of text and identifying a gateway on the color histogram, wherein the identified gateway is associated with the detected rung. The junction list is updated with data including a description of the identified gateway.
Abstract:
The current document is directed to methods and systems for identifying symbols corresponding to symbol images in a scanned-document image or other text-containing image, with the symbols corresponding to Chinese or Japanese characters, to Korean morpho-syllabic blocks, or to symbols of other languages that use a large number of symbols for writing and printing. In one implementation, the methods and systems to which the current document is directed carry out an initial processing step on one or more scanned images to identify a subset of the total number of symbols frequently used in the scanned document image or images. One or more lists of graphemes for the language of the text are then ordered in most-likely-occurring to least-likely-occurring order to facilitate a second optical-character-recognition step in which symbol images extracted from the one or more scanned-document images are associated with one or more graphemes most likely to correspond to the scanned symbol image.
Abstract:
The current document is directed to methods and systems for identifying symbols corresponding to symbol images in a scanned-document image or other text-containing image, with the symbols corresponding to Chinese or Japanese characters, to Korean morpho-syllabic blocks, or to symbols of other languages that use a large number of symbols for writing and printing. In one implementation, the methods and systems to which the current document is directed carry out an initial processing step on one or more scanned images to identify, for each symbol image within a scanned document, a set of graphemes that match, with high frequency, symbol patterns that, in turn, match the symbol image. The set of graphemes identified for a symbol image is associated with the symbol image as a set of candidate graphemes for the symbol image. The set of candidate graphemes are then used, in one or more subsequent steps, to associate each symbol image with a most likely corresponding symbol code.
Abstract:
The current document is directed to methods and systems for identifying Chinese, Japanese, Korean, or similar language symbols that correspond to symbol images in a scanned-document image or other text-containing image. In a first processing phase, each symbol image is associated with a set of candidate graphemes. In a second processing phase, each symbol image is evaluated with respect to the set of candidate graphemes identified for the symbol image during the first phase. As candidate graphemes are processed, the currently described methods and systems monitor progress towards identifying a matching grapheme and, when insufficient progress is observed, terminate processing of the candidate graphemes and identify the symbol image as a non-symbol-containing area of the scanned-document image or other text-containing image.
Abstract:
The current document is directed to methods and systems for identifying symbols corresponding to symbol images in a scanned-document image or other text-containing image, with the symbols corresponding to Chinese or Japanese characters, to Korean morpho-syllabic blocks, or to symbols of other languages that use a large number of symbols for writing and printing. In one implementation, the methods and systems to which the current document is directed carry out an initial processing step on one or more scanned images to identify a subset of the total number of symbols frequently used in the scanned document image or images. One or more lists of graphemes for the language of the text are then ordered in most-likely-occurring to least-likely-occurring order to facilitate a second optical-character-recognition step in which symbol images extracted from the one or more scanned-document images are associated with one or more graphemes most likely to correspond to the scanned symbol image.
Abstract:
The current document is directed to methods and systems for identifying symbols corresponding to symbol images in a scanned-document image or other text-containing image, with the symbols corresponding to Chinese or Japanese characters, to Korean morpho-syllabic blocks, or to symbols of other languages that use a large number of symbols for writing and printing. In one implementation, the methods and systems to which the current document is directed carry out an initial processing step on one or more scanned images to identify, for each symbol image within a scanned document, a set of graphemes that match, with high frequency, symbol patterns that, in turn, match the symbol image. The set of graphemes identified for a symbol image is associated with the symbol image as a set of candidate graphemes for the symbol image. The set of candidate graphemes are then used, in one or more subsequent steps, to associate each symbol image with a most likely corresponding symbol code.