-
公开(公告)号:US11763094B2
公开(公告)日:2023-09-19
申请号:US17319427
申请日:2021-05-13
Applicant: SAP SE
Inventor: Christian Reisswig
CPC classification number: G06F40/30 , G06F40/126 , G06N3/08 , G06V10/225 , G06V10/40 , G06V30/10
Abstract: Natural language processing systems and methods are disclosed herein. In some embodiments, digital document information comprising text is received. The digital document information may be processed through word and character encoding operations to generate word and character vectors while retaining document location information for the words and characters. The data may be then be processed by a series of convolution and maximum pooling operations to obtain maximum valued elements from the data. The document location information as well as the maximum values element data may be further processed for semantic classification of the data using a semantic classifier and bounding box regression.
-
公开(公告)号:US11514489B2
公开(公告)日:2022-11-29
申请号:US17142865
申请日:2021-01-06
Applicant: SAP SE
Inventor: Ying Jiang , Christian Reisswig
IPC: G06Q30/04 , G06F40/186 , G06F40/169 , G06Q30/06 , G06F3/0481
Abstract: Disclosed herein are various embodiments for targeted document information extraction. An embodiment operates by receiving a document associated with a particular customer of a plurality of customers. It is determined whether to use a global processor or template processor to analyze the document based on whether one or more customer templates are associated with the particular customer. Which of the one or more templates associated with the particular customer correspond to the document is identified. The document is compared to the identified template associated with the customer. Information is extracted from the document based on the identified template and the identified plurality of variations. The extracted information for the document is output.
-
公开(公告)号:US10540579B2
公开(公告)日:2020-01-21
申请号:US15983489
申请日:2018-05-18
Applicant: SAP SE
Inventor: Christian Reisswig , Anoop Raveendra Katti , Steffen Bickel , Johannes Hoehne , Jean Baptiste Faddoul
Abstract: Disclosed herein are system, method, and computer program product embodiments for processing a document. In an embodiment, a document processing system may receive a document. The document processing system may perform optical character recognition to obtain character information and positioning information for the characters. The document processing system may generate a down-sampled two-dimensional character grid for the document. The document processing system may apply a convolutional neural network to the character grid to obtain semantic meaning for the document. The convolutional neural network may produce a segmentation mask and bounding boxes to correspond to the document.
-
公开(公告)号:US20220366144A1
公开(公告)日:2022-11-17
申请号:US17319427
申请日:2021-05-13
Applicant: SAP SE
Inventor: Christian Reisswig
IPC: G06F40/30 , G06K9/46 , G06F40/126 , G06K9/20 , G06N3/08
Abstract: Natural language processing systems and methods are disclosed herein. In some embodiments, digital document information comprising text is received. The digital document information may be processed through word and character encoding operations to generate word and character vectors while retaining document location information for the words and characters. The data may be then be processed by a series of convolution and maximum pooling operations to obtain maximum valued elements from the data. The document location information as well as the maximum values element data may be further processed for semantic classification of the data using a semantic classifier and bounding box regression.
-
公开(公告)号:US11488020B2
公开(公告)日:2022-11-01
申请号:US16890977
申请日:2020-06-02
Applicant: SAP SE
Inventor: Christian Reisswig , Shachar Klaiman
Abstract: Technologies are described for performing adaptive high-resolution digital image processing using neural networks. For example, a number of different regions can be defined representing portions of a digital image. One of the regions covers the entire digital image at a reduced resolution. The other regions cover less than the entire digital image at resolutions higher than the region covering the entire digital image. Neural networks are then used to process each of the regions. The neural networks share information using prolongation and restriction operations. Prolongation operations propagate activations from a neural network operating on a lower resolution region to context zones of a neural network operating on a higher resolution region. Restriction operations propagate activations from the neural network operating on the higher resolution region back to the neural network operating on the lower resolution region.
-
公开(公告)号:US11275934B2
公开(公告)日:2022-03-15
申请号:US16689498
申请日:2019-11-20
Applicant: SAP SE
Inventor: Christian Reisswig , Stefan Klaus Baur
Abstract: Disclosed herein are system, method, and computer program product embodiments for generating document labels using positional embeddings. In an embodiment, a label system may identify tokens, such as words, of a document image. The label system may apply a position vector neural network to the document image to analyze the pixels and determine positional embedding vectors corresponding to the words. The label system may then combine the positional embedding vectors to corresponding word vectors for use as an input to a neural network trained to generate document labels. This combination may embed the positional information with the corresponding word information in a serialized manner for processing by the document label neural network. Using this formatting, the label system may generate document labels in a light-weight and fast manner while still preserving spatial relationships between words.
-
公开(公告)号:US10915786B2
公开(公告)日:2021-02-09
申请号:US16288357
申请日:2019-02-28
Applicant: SAP SE
Inventor: Johannes Hoehne , Anoop Raveendra Katti , Christian Reisswig , Marco Spinaci
IPC: G06K9/62
Abstract: Disclosed herein are system, method, and computer program product embodiments for providing object detection and filtering operations. An embodiment operates by receiving an image comprising a plurality of pixels and pixel information for each pixel. The pixel information indicates a bounding box corresponding to an object within the image associated with a respective pixel and a confidence score associated with the bounding box for the respective pixel. Pixels that do not correspond to a center of at least one of the bounding boxes are iteratively removed from the plurality of pixels until a subset of pixels each of which correspond to a center of at least one of the bounding boxes remains. Based on the subset, a final bounding box associated with each object of the image is determined based on an overlapping of the bounding boxes of the subset of pixels and the corresponding confidence scores.
-
公开(公告)号:US20230334309A1
公开(公告)日:2023-10-19
申请号:US17720658
申请日:2022-04-14
Applicant: SAP SE
Inventor: Alexey Streltsov , Monit Shah Singh , Dhananjay Tomar , Christian Reisswig , Minh Duc Bui
IPC: G06N3/08
CPC classification number: G06N3/08
Abstract: Systems, methods, and computer-readable media for generating a synthetic training data set from an original unstructured electronic document are disclosed. The synthetic training data set may be used to train a deep learning model to extract data from the original electronic document. The original electronic document may comprise annotated data fields. Each annotated data field may comprise a bounding box and a label. The original electronic document may comprise a header, a table, and a footer. Macro augmentation operations may be applied to the original electronic document to create sub-templates representative of distinct page layouts in the original electronic document. The synthetic training data set may be generated by applying geometric and semantic data augmentations to the sub-templates and the original electronic documents. The synthetic training data set may then be provided the deep learning model for training.
-
公开(公告)号:US11557140B2
公开(公告)日:2023-01-17
申请号:US17107223
申请日:2020-11-30
Applicant: SAP SE
Inventor: Christian Reisswig
IPC: G06V30/416 , G06F40/30
Abstract: Disclosed herein are system, method, and computer program product embodiments for correcting extracted document information based on generated confidence and correctness scores. In an embodiment, a document correcting system may receive a document and document information that represents information extracted from the document. The document correcting system may determine the correctness of the document information by processing the document to generate a character grid representing textual information and spatial arrangements for the text within the document. The document correcting system may apply a convolutional neural network on character grid and the document information. The convolutional neural network may output corrected document information, a correctness value indicating the possible errors in the document information, and a confidence value indicating a likelihood of the possible errors.
-
公开(公告)号:US11281928B1
公开(公告)日:2022-03-22
申请号:US17029180
申请日:2020-09-23
Applicant: SAP SE
Inventor: Johannes Hoehne , Christian Reisswig
IPC: G06K9/34 , G06K9/46 , G06F16/903 , G06T7/11
Abstract: Disclosed herein are system, method, and computer program product embodiments for querying document terms and identifying target data from documents. In an embodiment, a document processing system may receive a document and a query string. The document processing system may perform optical character recognition to obtain character information and positioning information for the characters of the document. The document processing system may generate a two-dimensional character grid for the document. The document processing system may apply a convolutional neural network to the character grid and the query string to identify target data from the document corresponding to the query string. The convolutional neural network may then produce a segmentation mask and/or bounding boxes to identify the targeted data.
-
-
-
-
-
-
-
-
-