GENERATING DATA REGULATION COMPLIANT DATA FROM APPLICATION INTERFACE DATA

    公开(公告)号:US20230053109A1

    公开(公告)日:2023-02-16

    申请号:US17402940

    申请日:2021-08-16

    Applicant: SAP SE

    Abstract: The present disclosure involves systems, software, and computer-implemented methods for generating data regulation-compliant data from application interface data. One example method includes receiving a request for creation of document data. The request includes personal data of a user. Document data, including at least some of the personal data, is created based on the request. The document data is encoded into an encoded document that does not include any personal data of the user and includes structural information that describes the structure of the document data. A request to use the encoded document is received and the encoded document is decoded. A synthetic document is generated using the structural information included in the encoded document. Generation of the synthetic document includes insertion of synthetic user data into the synthetic document at positions in the synthetic document that correspond to positions of personal data within the document data.

    AUGMENTING ELECTRONIC DOCUMENTS TO GENERATE SYNTHETIC TRAINING DATA SETS

    公开(公告)号:US20230334309A1

    公开(公告)日:2023-10-19

    申请号:US17720658

    申请日:2022-04-14

    Applicant: SAP SE

    CPC classification number: G06N3/08

    Abstract: Systems, methods, and computer-readable media for generating a synthetic training data set from an original unstructured electronic document are disclosed. The synthetic training data set may be used to train a deep learning model to extract data from the original electronic document. The original electronic document may comprise annotated data fields. Each annotated data field may comprise a bounding box and a label. The original electronic document may comprise a header, a table, and a footer. Macro augmentation operations may be applied to the original electronic document to create sub-templates representative of distinct page layouts in the original electronic document. The synthetic training data set may be generated by applying geometric and semantic data augmentations to the sub-templates and the original electronic documents. The synthetic training data set may then be provided the deep learning model for training.

    MODEL-INDEPENDENT CONFIDENCE VALUE PREDICTION MACHINE LEARNED MODEL

    公开(公告)号:US20220366301A1

    公开(公告)日:2022-11-17

    申请号:US17354202

    申请日:2021-06-22

    Applicant: SAP SE

    Abstract: In an example embodiment, a confidence score is computed for a predicted label (from a first model) for information extracted from a document. The confidence score is computed using a machine learned model different than the first model which is based on a Sliding-Window method. The Sliding-Window method may be based on convolutional neural networks classification, using sliding windows. It receives as input (1) the string of extracted information from an independent previous information extracted step (the “input text”), (2) the string's predicted class label, (3) the string's coordinate location in the document, and (4) the text of the document (for additional context information). The Sliding-Window method's task is to predict the confidence score to determine the correctness of the predicted label for the information.

    Generating data regulation compliant data from application interface data

    公开(公告)号:US12079284B2

    公开(公告)日:2024-09-03

    申请号:US17402940

    申请日:2021-08-16

    Applicant: SAP SE

    CPC classification number: G06F16/93

    Abstract: The present disclosure involves systems, software, and computer-implemented methods for generating data regulation-compliant data from application interface data. One example method includes receiving a request for creation of document data. The request includes personal data of a user. Document data, including at least some of the personal data, is created based on the request. The document data is encoded into an encoded document that does not include any personal data of the user and includes structural information that describes the structure of the document data. A request to use the encoded document is received and the encoded document is decoded. A synthetic document is generated using the structural information included in the encoded document. Generation of the synthetic document includes insertion of synthetic user data into the synthetic document at positions in the synthetic document that correspond to positions of personal data within the document data.

Patent Agency Ranking