-
公开(公告)号:US20240289557A1
公开(公告)日:2024-08-29
申请号:US18113903
申请日:2023-02-24
Applicant: SAP SE
Inventor: Eduardo Vellasques , Xiang Yu , Stefan Klaus Baur , Manuel Zeise
IPC: G06F40/40 , G06F16/33 , G06F40/284
CPC classification number: G06F40/40 , G06F16/3347 , G06F40/284
Abstract: Systems and methods are provided for automated identification of key-value pairs in documents. A document including readable text is received. The document is processed to determine, from the readable text, a plurality of tokens. Pairs of vectors corresponding to the plurality of tokens are determined, each pair of vectors comprising a query vector and a key vector. Attention scores are determined for the plurality of tokens by using the pairs of vectors. The attention scores are normalized to generate normalized attention scores. Connected tokens are identified in the plurality of tokens using the normalized attention scores.
-
公开(公告)号:US20250111687A1
公开(公告)日:2025-04-03
申请号:US18375952
申请日:2023-10-02
Applicant: SAP SE
Inventor: Christoph Meyer , Xiang Yu
IPC: G06V30/146
Abstract: Systems and processes for evaluating algorithms for aligning weakly-annotated data to recognized characters in a document are provided. In a method for evaluating an algorithm for aligning annotation data to recognized characters, strong annotations and weak-to-strong annotations, which are generated by applying a weak-to-strong annotation alignment algorithm, for a document are received and matched to generate respective pairs of matched annotations. For each pair of matched annotations, respective metrics are calculated including comparisons of aspects of the strong annotations to the weak-to-strong annotations. The respective metrics are aggregated, and an indication of the aggregated metrics are output to a graphical user interface or targeted application. Aggregated metrics determined for different weak-to-strong annotation alignment algorithms may be compared in order to select or adjust an algorithm to be used for Optical Character Recognition (OCR) operations.
-
公开(公告)号:US20250054325A1
公开(公告)日:2025-02-13
申请号:US18231652
申请日:2023-08-08
Applicant: SAP SE
Inventor: Xiang Yu , Christoph Meyer
IPC: G06V30/14 , G06F40/284 , G06V10/70 , G06V30/19 , G06V30/414
Abstract: Systems and processes for aligning weakly-annotated data to recognized characters in a document are provided. In a method for aligning annotation data to recognized characters, annotation words and character recognition tokens are received, and a search algorithm is performed to align the annotation words to the tokens in a stepwise manner. At each step, an annotation word is aligned to one or more tokens, and a cost of each respective alignment is calculated. Once all annotation words are aligned, a full set of annotation word-token pairs corresponding to the annotation is selected based on a total cost of alignment for that set. A bounding box enclosing the tokens in the selected full set is generated and output to a target application.
-
-