Patent search ap:("SAP SE") AND inv:"Xiang YU" Page 1

1.

发明公开
EFFICIENT DOCUMENT INFORMATION EXTRACTION SYSTEM USING OPTICAL CHARACTER RECOGNITION (OCR) INFORMATION 审中-公开

公开(公告)号：US20240177515A1

公开(公告)日：2024-05-30

申请号：US17994977

申请日：2022-11-28

Applicant: SAP SE

Inventor： SOHYEONG KIM , Xiang YU

IPC: G06V30/414 , G06V30/19

CPC classification number: G06V30/414 , G06V30/19007

Abstract: Embodiments are described for a system comprising a memory and at least one processor coupled to the memory. The at least one processor is configured to receive optical character recognition (OCR) information of a document and determine a beginning, inside, and outside (BIO) tags and labels of the one or more word boxes based on the OCR information. The at least one processor is further configured to group a first word box and a second word box based on BIO tags of the first and the second word boxes and merge the first and the second word boxes into a combined word box based on a label of the first word box matching a label of the second word box. Finally, the at least one processor is configured to output the combined word box and the label of the first word box.

2.

发明公开
NEURAL NETWORK WORD CLUSTERING SYSTEM 审中-公开

公开(公告)号：US20240177011A1

公开(公告)日：2024-05-30

申请号：US18071231

申请日：2022-11-29

Applicant: SAP SE

Inventor： MAREK POLEWCZYK , Marco SPINACI , Xiang YU

IPC: G06N3/09

CPC classification number: G06N3/09

Abstract: Various embodiments for a neural network clustering system are described herein. An embodiment operates by detecting a plurality of bounding boxes and identifying coordinates for each of the bounding boxes. An adjacency matrix is generated based on combining a key matrix and a query matrix. The plurality of words are clustered into a plurality of clusters, each cluster corresponding to a different line on the first document. A second document is generated in which the plurality of words corresponding to a respective cluster of the plurality of clusters is arranged on a same line on the second document. The second document is provided for display.

Patent Agency Ranking