-
1.
公开(公告)号:US20240177515A1
公开(公告)日:2024-05-30
申请号:US17994977
申请日:2022-11-28
Applicant: SAP SE
Inventor: SOHYEONG KIM , Xiang YU
IPC: G06V30/414 , G06V30/19
CPC classification number: G06V30/414 , G06V30/19007
Abstract: Embodiments are described for a system comprising a memory and at least one processor coupled to the memory. The at least one processor is configured to receive optical character recognition (OCR) information of a document and determine a beginning, inside, and outside (BIO) tags and labels of the one or more word boxes based on the OCR information. The at least one processor is further configured to group a first word box and a second word box based on BIO tags of the first and the second word boxes and merge the first and the second word boxes into a combined word box based on a label of the first word box matching a label of the second word box. Finally, the at least one processor is configured to output the combined word box and the label of the first word box.
-
公开(公告)号:US20240177011A1
公开(公告)日:2024-05-30
申请号:US18071231
申请日:2022-11-29
Applicant: SAP SE
Inventor: MAREK POLEWCZYK , Marco SPINACI , Xiang YU
IPC: G06N3/09
CPC classification number: G06N3/09
Abstract: Various embodiments for a neural network clustering system are described herein. An embodiment operates by detecting a plurality of bounding boxes and identifying coordinates for each of the bounding boxes. An adjacency matrix is generated based on combining a key matrix and a query matrix. The plurality of words are clustered into a plurality of clusters, each cluster corresponding to a different line on the first document. A second document is generated in which the plurality of words corresponding to a respective cluster of the plurality of clusters is arranged on a same line on the second document. The second document is provided for display.
-