METHODS AND SYSTEMS FOR EXTRACTING INFORMATION FROM DOCUMENT IMAGES

    公开(公告)号:US20220284215A1

    公开(公告)日:2022-09-08

    申请号:US17332021

    申请日:2021-05-27

    Abstract: This disclosure relates to a method and system for extracting information from images of one or more templatized documents. A knowledge graph with a fixed schema based on background knowledge is used to capture spatial and semantic relationships of entities present in scanned document. An adaptive lattice-based approach based on formal concepts analysis (FCA) is used to determine a similarity metric that utilizes both spatial and semantic information to determine if the structure of the scanned document image adheres to any of the known document templates, If known document template whose structure is closely matching the structure of the scanned document is detected, then an inductive rule learning based approach is used to learn symbolic rules to extract information present in scanned document image. If a new document template is detected, then any future scanned document images belonging to new document template are automatically processed using the learnt rules.

Patent Agency Ranking