-
公开(公告)号:US20230053109A1
公开(公告)日:2023-02-16
申请号:US17402940
申请日:2021-08-16
Applicant: SAP SE
Inventor: Igor Schukovets , Alexey Streltsov
IPC: G06F16/93
Abstract: The present disclosure involves systems, software, and computer-implemented methods for generating data regulation-compliant data from application interface data. One example method includes receiving a request for creation of document data. The request includes personal data of a user. Document data, including at least some of the personal data, is created based on the request. The document data is encoded into an encoded document that does not include any personal data of the user and includes structural information that describes the structure of the document data. A request to use the encoded document is received and the encoded document is decoded. A synthetic document is generated using the structural information included in the encoded document. Generation of the synthetic document includes insertion of synthetic user data into the synthetic document at positions in the synthetic document that correspond to positions of personal data within the document data.
-
公开(公告)号:US20230334309A1
公开(公告)日:2023-10-19
申请号:US17720658
申请日:2022-04-14
Applicant: SAP SE
Inventor: Alexey Streltsov , Monit Shah Singh , Dhananjay Tomar , Christian Reisswig , Minh Duc Bui
IPC: G06N3/08
CPC classification number: G06N3/08
Abstract: Systems, methods, and computer-readable media for generating a synthetic training data set from an original unstructured electronic document are disclosed. The synthetic training data set may be used to train a deep learning model to extract data from the original electronic document. The original electronic document may comprise annotated data fields. Each annotated data field may comprise a bounding box and a label. The original electronic document may comprise a header, a table, and a footer. Macro augmentation operations may be applied to the original electronic document to create sub-templates representative of distinct page layouts in the original electronic document. The synthetic training data set may be generated by applying geometric and semantic data augmentations to the sub-templates and the original electronic documents. The synthetic training data set may then be provided the deep learning model for training.
-
公开(公告)号:US20220366301A1
公开(公告)日:2022-11-17
申请号:US17354202
申请日:2021-06-22
Applicant: SAP SE
Inventor: Nurzat Rakhmanberdieva , Alexey Streltsov , Christian Reisswig
Abstract: In an example embodiment, a confidence score is computed for a predicted label (from a first model) for information extracted from a document. The confidence score is computed using a machine learned model different than the first model which is based on a Sliding-Window method. The Sliding-Window method may be based on convolutional neural networks classification, using sliding windows. It receives as input (1) the string of extracted information from an independent previous information extracted step (the “input text”), (2) the string's predicted class label, (3) the string's coordinate location in the document, and (4) the text of the document (for additional context information). The Sliding-Window method's task is to predict the confidence score to determine the correctness of the predicted label for the information.
-
公开(公告)号:US12079284B2
公开(公告)日:2024-09-03
申请号:US17402940
申请日:2021-08-16
Applicant: SAP SE
Inventor: Igor Schukovets , Alexey Streltsov
IPC: G06F16/93
CPC classification number: G06F16/93
Abstract: The present disclosure involves systems, software, and computer-implemented methods for generating data regulation-compliant data from application interface data. One example method includes receiving a request for creation of document data. The request includes personal data of a user. Document data, including at least some of the personal data, is created based on the request. The document data is encoded into an encoded document that does not include any personal data of the user and includes structural information that describes the structure of the document data. A request to use the encoded document is received and the encoded document is decoded. A synthetic document is generated using the structural information included in the encoded document. Generation of the synthetic document includes insertion of synthetic user data into the synthetic document at positions in the synthetic document that correspond to positions of personal data within the document data.
-
公开(公告)号:US20220092405A1
公开(公告)日:2022-03-24
申请号:US17025845
申请日:2020-09-18
Applicant: SAP SE
Inventor: Matthias Frank , Hoang-Vu Nguyen , Stefan Klaus Baur , Alexey Streltsov , Jasmin Mankad , Cordula Guder , Konrad Schenk , Philipp Lukas Jamscikov , Rohit Kumar Gupta
Abstract: In an example embodiment, a deep neural network may be utilized to determine matches between candidate pairs of entities, as well as confidence scores that reflect how certain the deep neural network is about the corresponding match. The deep neural network is also able to find these matches without requiring domain knowledge that would be required if features for a machine-learned model were handcrafted, which is a drawback of prior art machine-learned models used to match entities in multiple tables. Thus, the deep neural network improves on the functioning of prior art machine learned models designed to perform the same tasks. Specifically, the deep neural network learns the relationships of tabular fields and the patterns that define a match from historical data alone, making this approach generic and applicable independent of the context.
-
-
-
-