DETECTING TABLE INFORMATION IN ELECTRONIC DOCUMENTS

    公开(公告)号:US20220318545A1

    公开(公告)日:2022-10-06

    申请号:US17222412

    申请日:2021-04-05

    Abstract: Techniques for processing of electronic documents comprising tables to desirably extract and/or recreate tables, including information in the tables, are presented. A document processing management component (DPMC) can perform a multi-stage process to extract a table from a document and recreate the table, including the table structure and information, in an editable form. During first stage, DPMC can identify candidate cells of the table based on analysis of the document, including identifying border lines that can represent cell borders, identifying any free floating candidate cells, and identifying characters of the candidate cells. During second stage, DPMC can determine structural relationships between respective candidate cells and respective neighbor candidate cells in all directions, based on applicable rules, and record the respective associations between those candidate cells. During third stage, DPMC can determine row/column placement and scaling of the candidate cells based on the respective associations and applicable rules.

Patent Agency Ranking