TEMPLATE AGNOSTIC DOCUMENT READER
    1.
    发明公开

    公开(公告)号:US20230306771A1

    公开(公告)日:2023-09-28

    申请号:US17704611

    申请日:2022-03-25

    CPC classification number: G06V30/414 G06V30/412 G06V30/416

    Abstract: A method and/or system for template agnostic document reader for automated extraction of data from documents is disclosed, wherein the kernel values are extracted from document image using which horizontal lines and vertical lines are determined in the document image. The horizontal lines are determined by extracting pixel values representing character width in the document image. The area of intersecting horizontal and vertical lines in the image is identified as tabular section and tabular data is extracted. The area in the document image with text and horizontal lines and absence of vertical lines are identified as forms section and forms data is extracted. The area in the document image without any horizontal lines and vertical lines and with similar line spacing are identified as paragraph section and paragraph data is extracted.

Patent Agency Ranking