摘要:
A logical structure analyzing apparatus includes an extracting unit that extracts word candidates from a form, a first generating unit that classifies each of the word candidates into a group of heading candidates or a group of data candidates to generate, based on positions of the word candidates on the form, first candidates sets each including one heading candidate and one data candidate identifiable by the heading candidate, and a second generating unit that combines the first candidate sets to generate second candidate sets that each include plural heading candidates that differ and one data candidate. The apparatus also includes a removing unit that, based on positions of the heading candidates and the data word candidate in each second candidate set, removes from among the second candidates sets, a determined set including a data item and headings identifying the data item, and an output unit that outputs the determined set.
摘要:
The form data extracting apparatus, even input form does not have a logical structure stored in the generic logical structure DB, by using logical elements in the existing logical structure and a registered form obtained on the basis of (a) the logical structure, (b) pieces of position information of the logical elements, and (c) a relation between the logical elements. A logical element and a logical structure are extracted from the input form, and the extracted logical structure can be defined as a new registered form or a new logical structure.
摘要:
An image recognition apparatus recognizes the correspondence between character strings and logical elements composing a logical structure in an image in which the character strings are described as the logical elements to recognize each logical element. The image recognition apparatus includes outputting means for outputting the recognized logical elements when the correspondence is recognized or re-recognized; first determining means for determining a certain logical element to be correct when input of a determination request to determine the logical element is received from a user; second determining means for determining the correctness of all the logical elements output before the logical element determined by the first determining means and is positioned according to confirmation by the user; and re-recognizing means for re-recognizing the correspondence between logical elements that have not been determined to be correct and the character strings on the basis of the determination content for each logical element.
摘要:
A dictionary creating apparatus registers probability distributions each including an average vector and a covariance matrix, in a dictionary. The dictionary creating apparatus organizes plural distribution profiles of character categories having similar feature vectors into one typical distribution profile, and registers the typical distribution profile and the character categories to be organized, associated with each other, in the dictionary, without registering eigenvalues and eigenvectors of all character categories, associated with each other, in the dictionary.
摘要:
Upon receiving, for example, document data including a character string from outside, a character recognition device detects a line from a line-touching character-string image in which at least one character (such as number, alphabet letter, kana character, and Chinese character) touches (or overlaps) a line in the document data, tentatively removes the line, and estimates a character region. The character recognition device extracts a line-touching character image from the line-touching character-string image (original image) based on the estimated character region. The character recognition device creates a line-added reference character image by adding a quasi-line to a reference character image stored in advance.
摘要:
This method includes: extracting a feature vector for an input character from a reading result of the input character; calculating distances between the feature vector for the input character and vectors including average vectors stored in a system dictionary storing, for each character, the average vector and distribution information, and feature vectors stored in a user dictionary; extracting the top N character codes in an ascending order of the calculated distances; obtaining second distribution information for the character codes, which are included the user dictionary and in the top N character codes; calculating, for each of the top N character codes, a second distance with the feature vector for the input character, by using, for the character codes, which are included in the user dictionary and in the top N character codes, the second distribution information; and identifying a character code whose second distance is shortest.
摘要:
A key word is first and automatically extracted from a character string group to be recognized, and entered. Then, a character is recognized by segmenting an individual character from a character string image to be recognized, and a character string corresponding to the extracted/entered key word id extracted. Then, a word area delimited by a key word is extracted from the character string image, and a word is recognized. Furthermore, a word recognition result is verified, and a final character string recognition result is output.
摘要:
A replay control method of controlling reply means for replaying video content executed by a computer, the method includes: accepting one or more keywords; retrieving, from pieces of correspondence information each containing fraction part information specifying a piece of video content and a fraction part in the piece of video content, and a word string expressed in the fraction part, each piece of correspondence information whose word string contains at least one of the accepted one or more keywords; and making the replay means replay the fraction part specified by each retrieved piece of correspondence information.
摘要:
This method includes: extracting a feature vector for an input character from a reading result of the input character; calculating distances between the feature vector for the input character and vectors including average vectors stored in a system dictionary storing, for each character, the average vector and distribution information, and feature vectors stored in a user dictionary; extracting the top N character codes in an ascending order of the calculated distances; obtaining second distribution information for the character codes, which are included the user dictionary and in the top N character codes; calculating, for each of the top N character codes, a second distance with the feature vector for the input character, by using, for the character codes, which are included in the user dictionary and in the top N character codes, the second distribution information; and identifying a character code whose second distance is shortest.
摘要:
A dictionary creating apparatus registers probability distributions each including an average vector and a covariance matrix, in a dictionary. The dictionary creating apparatus organizes plural distribution profiles of character categories having similar feature vectors into one typical distribution profile, and registers the typical distribution profile and the character categories to be organized, associated with each other, in the dictionary, without registering eigenvalues and eigenvectors of all character categories, associated with each other, in the dictionary.