Invention Publication
- Patent Title: EFFICIENT DOCUMENT INFORMATION EXTRACTION SYSTEM USING OPTICAL CHARACTER RECOGNITION (OCR) INFORMATION
-
Application No.: US17994977Application Date: 2022-11-28
-
Publication No.: US20240177515A1Publication Date: 2024-05-30
- Inventor: SOHYEONG KIM , Xiang YU
- Applicant: SAP SE
- Applicant Address: DE Walldorf
- Assignee: SAP SE
- Current Assignee: SAP SE
- Current Assignee Address: DE Walldorf
- Main IPC: G06V30/414
- IPC: G06V30/414 ; G06V30/19

Abstract:
Embodiments are described for a system comprising a memory and at least one processor coupled to the memory. The at least one processor is configured to receive optical character recognition (OCR) information of a document and determine a beginning, inside, and outside (BIO) tags and labels of the one or more word boxes based on the OCR information. The at least one processor is further configured to group a first word box and a second word box based on BIO tags of the first and the second word boxes and merge the first and the second word boxes into a combined word box based on a label of the first word box matching a label of the second word box. Finally, the at least one processor is configured to output the combined word box and the label of the first word box.
Information query