-
公开(公告)号:US20210209301A1
公开(公告)日:2021-07-08
申请号:US16734880
申请日:2020-01-06
Applicant: SAP SE
Inventor: Rohit Kumar Gupta , Johannes HOEHNE , Anoop Raveendra KATTI
IPC: G06F40/274 , G06F40/30 , G06K9/32 , G06F40/289
Abstract: System, method, and various embodiments for providing contextualized character recognition system are described herein. An embodiment operates by determining a plurality of predicted words of an image. An accuracy measure or each of the plurality of predicted words is identified and a replaceable word with an accuracy measure below a threshold is identified. A plurality of candidate words associated with the replaceable word are identified and a probability for each of the candidate words is calculated based on a contextual analysis. One of the candidate words with a highest probability is selected. The plurality of predicted words including the selected candidate word with the highest probability replacing the replaceable word is output.
-
公开(公告)号:US20200302208A1
公开(公告)日:2020-09-24
申请号:US16359012
申请日:2019-03-20
Applicant: SAP SE
Inventor: Johannes HOEHNE , Christian REISSWIG , Anoop Raveendra KATTI , Marco SPINACI
Abstract: Disclosed herein are system, method, and computer program product embodiments for optical character recognition using end-to-end deep learning. In an embodiment, an optical character recognition system may train a neural network to identify characters of pixel images, assign index values to the characters, and recognize different formatting of the characters, such as distinguishing between handwritten and typewritten characters. The neural network may also be trained to identify, groups of characters and to generate bounding boxes to group these characters. The optical character recognition system may then analyze documents to identify character information based on the pixel data and produce segmentation masks, such as a type grid segmentation mask, and one or more bounding box masks. The optical character recognition system may supply these masks as an output or may combine the masks to generate a version of the received document having optically recognized characters.
-