KNOWLEDGE GRAPH ENTITIES FROM TEXT
    1.
    发明公开

    公开(公告)号:US20230359825A1

    公开(公告)日:2023-11-09

    申请号:US17738898

    申请日:2022-05-06

    Applicant: SAP SE

    CPC classification number: G06F40/295 G06N3/08 G06N3/0427 G06N5/022 G06F40/253

    Abstract: Example methods and systems are directed to generating knowledge graph entities from text. Natural language text is received as input and processed using named entity recognition (NER), part of speech (POS) recognition, and business object recognition (BOR). The outputs of the NER, POS, and BOR processes are combined to generate knowledge entity triples comprising two entities and a relationship between them. Keywords are extracted from the text using NER to generate a set of entities. A node in a knowledge graph is created for at least some of the entities. A POS tagger identifies verbs in the text, generating a set of verbs. Relational verbs (e.g., “talk to” or “communicated with”) are detected and used to create edges in the knowledge graph. The knowledge graph may be converted back to natural language text using a trained machine learning model.

    SEMANTIC DUPLICATE NORMALIZATION AND STANDARDIZATION

    公开(公告)号:US20230139644A1

    公开(公告)日:2023-05-04

    申请号:US17513188

    申请日:2021-10-28

    Applicant: SAP SE

    Abstract: Systems, methods, and computer-readable media are disclosed for list attribute normalization and standardization for creation of a controlled vocabulary. A vocabulary set comprising a plurality of vocabulary term may be received. For each vocabulary term, semantic duplicates may be identified. The semantic duplicates may be identified by analyzing semantics, syntactics, or phonetics of the vocabulary terms. Semantic chains may be formed from each vocabulary term and the corresponding semantic duplicates. The terms in each semantic chain may be ranked to determine a most probable vocabulary term. The most probable vocabulary term may then replace the semantic chain. The most probable vocabulary term across all semantic chains from the vocabulary set may form the controlled vocabulary.

    VISUAL LABELING FOR MACHINE LEARNING TRAINING

    公开(公告)号:US20230133030A1

    公开(公告)日:2023-05-04

    申请号:US17516948

    申请日:2021-11-02

    Applicant: SAP SE

    Abstract: Systems, methods, and computer-readable media are disclosed for visual labeling of training data items for training a machine learning model. Training data items may be generated for training the machine learning model. Visual labels, such as QR codes, may be created for the training data items. The creation of the training data item and the visual label may be automated. The visual labels and the training data items may be combined to obtain a labeled training data item. The labeled training data item may comprise a separator to distinguish the training data item from the visual label. The labeled training data item may be used for training and validation of the machine learning model. The machine learning model may analyze the training data item, attempt to identify the training data item, and compare the identification against the embedded label.

    Measuring documentation completeness in multiple languages

    公开(公告)号:US11620127B2

    公开(公告)日:2023-04-04

    申请号:US17317340

    申请日:2021-05-11

    Applicant: SAP SE

    Abstract: Source code is analyzed to identify components. The components are each assigned a complexity score. Documentation for the source code is identified, related to the components, and given a score based on the quantity of the documentation for the component and the complexity score for the component. To determine semantic meaning of the documentation, vector embeddings for the documentation languages may be generated and aligned. Alignment causes the different machine learning models to generate similar vectors for semantically similar words in the different languages. Since the vectors of the words of the other languages are similar to the vectors of the words in a primary language with similar meanings, the vector representation of the documentation in the other languages will match the vector representation of the source code when the documentation is substantially on the same topic.

    SMART DATASET COLLECTION SYSTEM
    5.
    发明申请

    公开(公告)号:US20230096118A1

    公开(公告)日:2023-03-30

    申请号:US17486554

    申请日:2021-09-27

    Applicant: SAP SE

    Abstract: Datasets are available from different dataset servers and often lack well-defined metadata. Thus, comparing datasets is difficult. Additionally, there might be different versions of the same dataset which makes the search even more difficult. Using systems and methods described herein, quality scores, dataset versioning, topic identification, and semantic relatedness metadata is stored about datasets stored on dataset servers. A user interface is provided to allow a user to search for datasets by specifying search criteria (e.g., a topic and a minimum quality score) and to be informed of responsive datasets. The user interface may further inform the user of the quality scores of the responsive datasets, the versions of the responsive datasets, or other metadata. From the search results, the user may select and download one or more of the responsive datasets.

    Knowledge graph entities from text

    公开(公告)号:US12242808B2

    公开(公告)日:2025-03-04

    申请号:US17738898

    申请日:2022-05-06

    Applicant: SAP SE

    Abstract: Example methods and systems are directed to generating knowledge graph entities from text. Natural language text is received as input and processed using named entity recognition (NER), part of speech (POS) recognition, and business object recognition (BOR). The outputs of the NER, POS, and BOR processes are combined to generate knowledge entity triples comprising two entities and a relationship between them. Keywords are extracted from the text using NER to generate a set of entities. A node in a knowledge graph is created for at least some of the entities. A POS tagger identifies verbs in the text, generating a set of verbs. Relational verbs (e.g., “talk to” or “communicated with”) are detected and used to create edges in the knowledge graph. The knowledge graph may be converted back to natural language text using a trained machine learning model.

    IMAGE SEGMENTATION FOR ANONYMIZATION FOR IMAGE PROCESSING

    公开(公告)号:US20240012936A1

    公开(公告)日:2024-01-11

    申请号:US17862091

    申请日:2022-07-11

    Applicant: SAP SE

    Abstract: An input image is divided into segments. The segments may be reassembled to reform the input image. The order of the segments may be stored in an encrypted database for which approved applications have the decryption key but users do not. This allows the approved applications to determine the order and reform the input image without allowing users to do the same. To further increase the difficulty of reforming the input image, the segments may be transformed. Example transformations include rotation and mirroring. The encrypted database may store an indication of the transformation applied to each segment. The effort of reforming the input image without access to the database is increased substantially. The reformed input image may be stored in transient memory only, without being stored to long-term storage. Thus, the reformed image cannot be accessed from a file system by unauthorized users.

    MULTI-LANGUAGE SOURCE CODE SEARCH ENGINE

    公开(公告)号:US20230040412A1

    公开(公告)日:2023-02-09

    申请号:US17395213

    申请日:2021-08-05

    Applicant: SAP SE

    Abstract: A machine learning model is trained to translate source code from one or more programming languages into a common programming language. The machine learning model translates source code from the other languages into the common programming language. A language embedder generates a vector for each function in the source code, all of which is now in the common programming language. A user provides a text search query which is converted by a language embedder to a vector. Based on the vector of the text search query and the vectors for the source code, search results are generated and presented in a user interface. Additional machine learning models may be trained and used to measure function complexity, test coverage, documentation quantity and complexity, or any suitable combination thereof. These measures may be used to determine which search results to present, an order in which to present search results, or both.

    MACHINE LEARNING FOR DOCUMENT COMPRESSION

    公开(公告)号:US20220067364A1

    公开(公告)日:2022-03-03

    申请号:US17009526

    申请日:2020-09-01

    Applicant: SAP SE

    Abstract: In an example embodiment, machine learning is used to intelligently compress documents to reduce the overall footprint of storing large amounts of files for an organization. Specifically, a document is split into parts, with each part representing a grouping of text or an image. Optical character recognition is performed to identify the text in images. Machine learning techniques are then applied to a part of a document in order to determine how relevant the document is for the organization. The parts that are deemed to be not relevant may then be reduced in size, either by omitting them completely or by summarizing them. This allows for the compression to be tailored specifically to the organization, resulting in the ability to compress or eliminate parts of documents that other organizations might have found relevant (and thus would not have been compressed or eliminated through traditional means).

    Model-based analysis in a relational database

    公开(公告)号:US11157780B2

    公开(公告)日:2021-10-26

    申请号:US16054242

    申请日:2018-08-03

    Applicant: SAP SE

    Abstract: A system includes a model repository comprising a plurality of models respectively being adapted to perform, when used by an analytical program, a computational task, in which a first database table is created in the database, the first database table having a predefined table structure that corresponds to the analytical program, a best-model of the plurality of models is stored in the first database table, and a request of a client device to perform the computational task and comprising input data is received. If the received request does not comprise a model-ID, the analytical program reads the model currently stored in the first table and uses the read model for performing the computational task on the input data. If the received request comprises a model-ID, the analytical program creates a second database table having the predefined table structure in the database, reads a model associated with the model-ID from the model repository, stores the read model associated with the model-ID in the second table, and uses the model read from the second table for performing the computational task on the input data.

Patent Agency Ranking