Self-Attentive Key-Value Extraction
    1.
    发明公开

    公开(公告)号:US20240289557A1

    公开(公告)日:2024-08-29

    申请号:US18113903

    申请日:2023-02-24

    Applicant: SAP SE

    CPC classification number: G06F40/40 G06F16/3347 G06F40/284

    Abstract: Systems and methods are provided for automated identification of key-value pairs in documents. A document including readable text is received. The document is processed to determine, from the readable text, a plurality of tokens. Pairs of vectors corresponding to the plurality of tokens are determined, each pair of vectors comprising a query vector and a key vector. Attention scores are determined for the plurality of tokens by using the pairs of vectors. The attention scores are normalized to generate normalized attention scores. Connected tokens are identified in the plurality of tokens using the normalized attention scores.

    ANNOTATION ALIGNMENT ALGORITHM ANALYSIS FOR CHARACTER RECOGNITION

    公开(公告)号:US20250111687A1

    公开(公告)日:2025-04-03

    申请号:US18375952

    申请日:2023-10-02

    Applicant: SAP SE

    Abstract: Systems and processes for evaluating algorithms for aligning weakly-annotated data to recognized characters in a document are provided. In a method for evaluating an algorithm for aligning annotation data to recognized characters, strong annotations and weak-to-strong annotations, which are generated by applying a weak-to-strong annotation alignment algorithm, for a document are received and matched to generate respective pairs of matched annotations. For each pair of matched annotations, respective metrics are calculated including comparisons of aspects of the strong annotations to the weak-to-strong annotations. The respective metrics are aggregated, and an indication of the aggregated metrics are output to a graphical user interface or targeted application. Aggregated metrics determined for different weak-to-strong annotation alignment algorithms may be compared in order to select or adjust an algorithm to be used for Optical Character Recognition (OCR) operations.

    ANNOTATION ALIGNMENT FOR CHARACTER RECOGNITION IN DOCUMENTS

    公开(公告)号:US20250054325A1

    公开(公告)日:2025-02-13

    申请号:US18231652

    申请日:2023-08-08

    Applicant: SAP SE

    Abstract: Systems and processes for aligning weakly-annotated data to recognized characters in a document are provided. In a method for aligning annotation data to recognized characters, annotation words and character recognition tokens are received, and a search algorithm is performed to align the annotation words to the tokens in a stepwise manner. At each step, an annotation word is aligned to one or more tokens, and a cost of each respective alignment is calculated. Once all annotation words are aligned, a full set of annotation word-token pairs corresponding to the annotation is selected based on a total cost of alignment for that set. A bounding box enclosing the tokens in the selected full set is generated and output to a target application.

Patent Agency Ranking