-
公开(公告)号:US3267258A
公开(公告)日:1966-08-16
申请号:US33252063
申请日:1963-12-23
Applicant: IBM
Inventor: BENE JACK F
CPC classification number: G09B7/066 , G06F7/02 , G06K17/0032
-
公开(公告)号:US3492646A
公开(公告)日:1970-01-27
申请号:US3492646D
申请日:1965-04-26
Applicant: IBM
Inventor: BENE JACK F , GARRY GERALD A
CPC classification number: G06K9/64
-
公开(公告)号:US3384875A
公开(公告)日:1968-05-21
申请号:US49024465
申请日:1965-09-27
Applicant: IBM
Inventor: BENE JACK F , NELSON PAUL E
CPC classification number: G11C13/048 , G06F9/4425 , G06K9/6807
Abstract: 1,102,359. Character recognition. INTERNATIONAL BUSINESS MACHINES CORPORATION. 1 Sept., 1966 [27 Sept., 1965], No. 38990/66. Heading G4R. A first reference to be cross-correlated with an unknown data set to identify the latter is selected from a plurality of references in a store by addressing the store with one of a set of addresses selected from a control word by logic means. As described, the unknown data set is obtained from " measurements " on the clipped and digitized output of a flying-spot scanner which covers a character on a document in a raster of vertical scans. Correlation commences when the character has been completely scanned, e.g. after a predetermined number of vertical scans. Portions of the control word are accessed from the store. The first byte of the control word contains control data (see below) and is always accessed. In addition, one of the remaining bytes is selected in accordance with (a) whether the document field being scanned contains numeric or alphabetic characters, deduced by the circuitry from the identity of the field as indicated by detecting thick black lines bounding the fields, (b) whether upper or both upper and lower case characters are used, as indicated by a document type switch, (c) whether thin, medium or heavy characters are present, as indicated by the clipping and digitizing circuit. Dependence on font style is also mentioned. This second byte contains a correlation cut-off value (see below), end-of-field and end-of-area bits for controlling termination of correlation, and the address of the first reference word to be correlated with the measurements (unknown data set). This reference word is accessed from store and contains besides the data to be correlated with the measurements, also the address of the next reference word to be used, end-of-field and end-ofarea bits to be compared with those from the control word to control whether this reference word is the last to be used, a code indicating the identity of the character, an additive constant for normalizing the number of mis-matches with respect to this reference word, and a branch address (see below). The correlation cut-off value specifies the maximum number of mismatches which can occur during correlation with the given reference word without ruling-out the code of the word as being possibly the identity of the unknown character, or in another mode (" converging cut-off mode ") it represents the initial value of this maximum. In the second mode the maximum is continually updated. The mode is selected by a bit from the control word. If a given reference word used has less than the maximum number of mismatches, the code from the word is entered into a stored decision word at a byte position determined by the number of mismatches. After the first such code is discovered, the correlation cut-off value is increased by the value of a " minimum distance tolerance " field from the control word. When the series of reference words to be correlated has ended, the decision word is scanned from the least-number-ofmismatehes end to detect any of the identifying codes stored therein. The first code encountered is taken as identifying the character provided it is within the first so many byte positions, as specified by a field (" field recognition threshold ") from the control word, and it is at least so many byte positions from the second code encountered, as specified by another field (" minimum distance criteria ") from the control word. Otherwise, in general, further reference words will be used (the first being taken from the " next address " or " branch address " in the last used reference word, depending on, e.g., the raster height at which the unknown character was scanned) and if these fail to produce positive identification, one or more rescans of the character are performed using a clipping threshold specified by the control word. The maximum number of rescans is also specified by the control word.
-
-