Contextualized Character Recognition System

    公开(公告)号:US20210209301A1

    公开(公告)日:2021-07-08

    申请号:US16734880

    申请日:2020-01-06

    Applicant: SAP SE

    Abstract: System, method, and various embodiments for providing contextualized character recognition system are described herein. An embodiment operates by determining a plurality of predicted words of an image. An accuracy measure or each of the plurality of predicted words is identified and a replaceable word with an accuracy measure below a threshold is identified. A plurality of candidate words associated with the replaceable word are identified and a probability for each of the candidate words is calculated based on a contextual analysis. One of the candidate words with a highest probability is selected. The plurality of predicted words including the selected candidate word with the highest probability replacing the replaceable word is output.

    RECOGNIZING TYPEWRITTEN AND HANDWRITTEN CHARACTERS USING END-TO-END DEEP LEARNING

    公开(公告)号:US20200302208A1

    公开(公告)日:2020-09-24

    申请号:US16359012

    申请日:2019-03-20

    Applicant: SAP SE

    Abstract: Disclosed herein are system, method, and computer program product embodiments for optical character recognition using end-to-end deep learning. In an embodiment, an optical character recognition system may train a neural network to identify characters of pixel images, assign index values to the characters, and recognize different formatting of the characters, such as distinguishing between handwritten and typewritten characters. The neural network may also be trained to identify, groups of characters and to generate bounding boxes to group these characters. The optical character recognition system may then analyze documents to identify character information based on the pixel data and produce segmentation masks, such as a type grid segmentation mask, and one or more bounding box masks. The optical character recognition system may supply these masks as an output or may combine the masks to generate a version of the received document having optically recognized characters.

Patent Agency Ranking