-
公开(公告)号:US20230306771A1
公开(公告)日:2023-09-28
申请号:US17704611
申请日:2022-03-25
Applicant: INFOSYS LIMITED
Inventor: Kalyan BOOSETTY , Kosigamaani THIRUMOORTHY
IPC: G06V30/414 , G06V30/412 , G06V30/416
CPC classification number: G06V30/414 , G06V30/412 , G06V30/416
Abstract: A method and/or system for template agnostic document reader for automated extraction of data from documents is disclosed, wherein the kernel values are extracted from document image using which horizontal lines and vertical lines are determined in the document image. The horizontal lines are determined by extracting pixel values representing character width in the document image. The area of intersecting horizontal and vertical lines in the image is identified as tabular section and tabular data is extracted. The area in the document image with text and horizontal lines and absence of vertical lines are identified as forms section and forms data is extracted. The area in the document image without any horizontal lines and vertical lines and with similar line spacing are identified as paragraph section and paragraph data is extracted.