• Patent Title: EFFICIENT DOCUMENT INFORMATION EXTRACTION SYSTEM USING OPTICAL CHARACTER RECOGNITION (OCR) INFORMATION
  • Application No.: US17994977
    Application Date: 2022-11-28
  • Publication No.: US20240177515A1
    Publication Date: 2024-05-30
  • Inventor: SOHYEONG KIMXiang YU
  • Applicant: SAP SE
  • Applicant Address: DE Walldorf
  • Assignee: SAP SE
  • Current Assignee: SAP SE
  • Current Assignee Address: DE Walldorf
  • Main IPC: G06V30/414
  • IPC: G06V30/414 G06V30/19
EFFICIENT DOCUMENT INFORMATION EXTRACTION SYSTEM USING OPTICAL CHARACTER RECOGNITION (OCR) INFORMATION
Abstract:
Embodiments are described for a system comprising a memory and at least one processor coupled to the memory. The at least one processor is configured to receive optical character recognition (OCR) information of a document and determine a beginning, inside, and outside (BIO) tags and labels of the one or more word boxes based on the OCR information. The at least one processor is further configured to group a first word box and a second word box based on BIO tags of the first and the second word boxes and merge the first and the second word boxes into a combined word box based on a label of the first word box matching a label of the second word box. Finally, the at least one processor is configured to output the combined word box and the label of the first word box.
Information query
Patent Agency Ranking
0/0