-
公开(公告)号:US5159667A
公开(公告)日:1992-10-27
申请号:US359839
申请日:1989-05-31
申请人: Roland G. Borrey , Daniel G. Borrey
发明人: Roland G. Borrey , Daniel G. Borrey
IPC分类号: G06K9/20
CPC分类号: G06K9/00442 , G06F17/30011 , Y10S706/90
摘要: This invention relates to an automatic identification method for scanned documents in an electronic document capture and storage system. The invention uses the technique of recognition of global document features compared to a knowledge base of known document types. The system first segments the digitized image of a document into physical and logical areas of significance and attempts to label these areas by determining the type of information they contain, without using OCR techniques. The system then attempts to match the areas segmented to objects described in the knowledge base. The system labels the areas successfully matched then selects the most probable document type based on the areas found within the document. Using computer learning methods, the system is capable of improving its knowledge of the documents it is supposed to recognize, by dynamically modifying the characteristics of its knowledge base thus sharpening its decision making capability.
摘要翻译: 本发明涉及电子文件采集和存储系统中扫描文件的自动识别方法。 与已知文献类型的知识库相比,本发明使用了全局文档特征的识别技术。 系统首先将文档的数字化图像分割成物理和逻辑领域的重要性,并尝试通过确定其包含的信息的类型来标记这些区域,而不使用OCR技术。 然后,系统尝试将分段的区域与知识库中描述的对象进行匹配。 系统标记成功匹配的区域,然后根据文档中找到的区域选择最可能的文档类型。 使用计算机学习方法,通过动态修改其知识库的特征,从而提高其决策能力,能够提高其应该认识到的文档的知识。