摘要:
An advanced data capture architecture is disclosed which enables the free-definition and re-definition of the format of document forms without requiring any reprogramming of the data processors which capture and use the data on the completed forms. The architecture encompasses the interactive operation of a host processor and one or more workstations in a data processing system. It includes the interaction between a host processor and a workstation in providing a list of common operand names which are meaningful to an application program running on the host. It includes the operation of the workstation creating a new document form using the list of common operand names. It includes the workstation performing character recognition of the filled-out form, transforming its information into coded data. It includes the workstation assembling a field data segment for each field, containing the common operand, the coded data and the popular name for the field. And it includes the operation of the host processor receiving the assembled field data segments from the workstation and providing the coded data to the application program which processes the information right from the form.
摘要:
A data processing system uses a machine-generated data structure (MGDS) to dynamically record and use the character recognition and repair histories of category fields on a document form. The MGDS includes a field data segment which has a coded data buffer portion and an error buffer portion for the extracted field image. Recognition coded data is entered into the coded data buffer portion and recognition error data is entered into the error buffer portion of the field data segment. Then subsequent repair processes can be applied to the recognition coded data by augmenting the MGDS with a repair segment for each character string which is repaired. A sequence of repair stages can be applied to a particular character string, each repair step adding another repair segment to the MGDS. At each stage of repair, the best estimate of the character string is placed into the coded data buffer portion of the field data segment. This enables the best estimate of the information content of the document field to be readily available for each stage of repair and for ultimate use in the data processing system.
摘要:
A data processing method, system and computer program repairs character recognition errors for digital images of document forms. A document form processing template is provided which specifies the identity and preferred sequence for selected, customized character recognition processes and selected, customized coded data error correction processes which are reasonably likely to be needed to automatically process a selected batch of document forms whose fields have certain, anticipated, uniform characteristics.
摘要:
An improved forms recognition method and system are disclosed, that minimizes the time required to perform the forms recognition process, by adaptively changing the processing sequence. In accordance with the invention, when new master forms are defined in the system, a new processing template is also defined. The processing template includes tables and indexes that give the profile of all the master forms that have been defined in the system. The processing template is then referred to at the time of forms recognition processing, to adaptively choose which forms recognition operations to perform, to minimize the time required to finish processing a particular completed form.