ANNOTATION ALIGNMENT ALGORITHM ANALYSIS FOR CHARACTER RECOGNITION

    公开(公告)号:US20250111687A1

    公开(公告)日:2025-04-03

    申请号:US18375952

    申请日:2023-10-02

    Applicant: SAP SE

    Abstract: Systems and processes for evaluating algorithms for aligning weakly-annotated data to recognized characters in a document are provided. In a method for evaluating an algorithm for aligning annotation data to recognized characters, strong annotations and weak-to-strong annotations, which are generated by applying a weak-to-strong annotation alignment algorithm, for a document are received and matched to generate respective pairs of matched annotations. For each pair of matched annotations, respective metrics are calculated including comparisons of aspects of the strong annotations to the weak-to-strong annotations. The respective metrics are aggregated, and an indication of the aggregated metrics are output to a graphical user interface or targeted application. Aggregated metrics determined for different weak-to-strong annotation alignment algorithms may be compared in order to select or adjust an algorithm to be used for Optical Character Recognition (OCR) operations.

    ANNOTATION ALIGNMENT FOR CHARACTER RECOGNITION IN DOCUMENTS

    公开(公告)号:US20250054325A1

    公开(公告)日:2025-02-13

    申请号:US18231652

    申请日:2023-08-08

    Applicant: SAP SE

    Abstract: Systems and processes for aligning weakly-annotated data to recognized characters in a document are provided. In a method for aligning annotation data to recognized characters, annotation words and character recognition tokens are received, and a search algorithm is performed to align the annotation words to the tokens in a stepwise manner. At each step, an annotation word is aligned to one or more tokens, and a cost of each respective alignment is calculated. Once all annotation words are aligned, a full set of annotation word-token pairs corresponding to the annotation is selected based on a total cost of alignment for that set. A bounding box enclosing the tokens in the selected full set is generated and output to a target application.

Patent Agency Ranking