-
公开(公告)号:US12223278B2
公开(公告)日:2025-02-11
申请号:US17860912
申请日:2022-07-08
Applicant: SAP SE
Inventor: Hans-Martin Ramsl
IPC: G06F40/30 , G06F16/2458 , G06F16/31
Abstract: Example methods and systems are directed to automatic data card generation for datasets. A data card is a summary that describes quantitative aspects of a dataset, qualitative aspects of a dataset, or both. The data samples and documentation of a dataset are analyzed automatically to determine a number of samples, a primary data type, a license, or any suitable combination thereof. Data formats for data and documentation of the dataset may be automatically recognized. Language of text data may be automatically recognized. The most frequent language for the text data may be identified as the primary language of the dataset. A data card may be created for the dataset. The data card may indicate the number of samples, the data formats used in the data set, the language of text data in the dataset, or any suitable combination thereof.
-
公开(公告)号:US12050873B2
公开(公告)日:2024-07-30
申请号:US17513188
申请日:2021-10-28
Applicant: SAP SE
Inventor: Hans-Martin Ramsl
IPC: G06F40/30 , G06F16/33 , G06F16/36 , G06F40/247 , G06N20/00
CPC classification number: G06F40/30 , G06F16/3343 , G06F16/374 , G06F40/247 , G06N20/00
Abstract: Systems, methods, and computer-readable media are disclosed for list attribute normalization and standardization for creation of a controlled vocabulary. A vocabulary set comprising a plurality of vocabulary term may be received. For each vocabulary term, semantic duplicates may be identified. The semantic duplicates may be identified by analyzing semantics, syntactics, or phonetics of the vocabulary terms. Semantic chains may be formed from each vocabulary term and the corresponding semantic duplicates. The terms in each semantic chain may be ranked to determine a most probable vocabulary term. The most probable vocabulary term may then replace the semantic chain. The most probable vocabulary term across all semantic chains from the vocabulary set may form the controlled vocabulary.
-
公开(公告)号:US11907336B2
公开(公告)日:2024-02-20
申请号:US17516948
申请日:2021-11-02
Applicant: SAP SE
Inventor: Ran M. Bittmann , Hans-Martin Ramsl
IPC: G06K9/62 , G06F18/21 , G06F18/214 , G06F18/2411 , G06K19/06 , G06V30/19
CPC classification number: G06F18/217 , G06F18/214 , G06F18/2411 , G06K19/06028 , G06K19/06037 , G06V30/19153
Abstract: Systems, methods, and computer-readable media are disclosed for visual labeling of training data items for training a machine learning model. Training data items may be generated for training the machine learning model. Visual labels, such as QR codes, may be created for the training data items. The creation of the training data item and the visual label may be automated. The visual labels and the training data items may be combined to obtain a labeled training data item. The labeled training data item may comprise a separator to distinguish the training data item from the visual label. The labeled training data item may be used for training and validation of the machine learning model. The machine learning model may analyze the training data item, attempt to identify the training data item, and compare the identification against the embedded label.
-
公开(公告)号:US11893990B2
公开(公告)日:2024-02-06
申请号:US17486661
申请日:2021-09-27
Applicant: SAP SE
Inventor: Hans-Martin Ramsl
IPC: G10L15/22 , G06F40/295 , G10L15/26
CPC classification number: G10L15/22 , G06F40/295 , G10L15/26 , G10L2015/223
Abstract: Text-to-speech translation is used to generate a transcript for an audio file. Text segments are associated with time segments in the transcript. A trained machine learning model determines, based on the text in the transcript, one or more topics for the audio file. The transcript is modified to include the determined one or more topics. A user interface may be presented that allows a user to search for portions of an audio file that relate to a particular topic. In response to the selected or entered topic, the user interface presents segments having a matching topic. The user may use voice or other user interface commands to modify the annotation of the audio file. User commands may also be used to extract data from the transcript and copy the data to a clipboard or to another application.
-
公开(公告)号:US20230359819A1
公开(公告)日:2023-11-09
申请号:US17738910
申请日:2022-05-06
Applicant: SAP SE
Inventor: Hans-Martin Ramsl
IPC: G06F40/274 , G06F40/289 , G06F40/30 , G06K19/06
CPC classification number: G06F40/274 , G06F40/289 , G06F40/30 , G06K19/06037
Abstract: Example methods and systems are directed to intelligent quick response (QR) code compression. Different versions of QR codes comprise different numbers of modules and represent different amounts of text. To ensure that a QR code can be read correctly, the minimum printed size of the QR code varies with the version. As described herein, intelligent QR code compression involves converting a QR code to text, compressing the text, and generating a smaller QR code that represents the compressed text. The resulting QR code may be printed at a smaller size or stored using less memory than the original QR code. Text processing may include sentence splitting, sentence ranking, and key phrase detection. The compressed text comprises one or more detected key phrases. The amount of compression may be configurable, such that greater compression results in less original information being included in the resulting QR code.
-
公开(公告)号:US10977031B2
公开(公告)日:2021-04-13
申请号:US16657118
申请日:2019-10-18
Applicant: SAP SE
Inventor: Hans-Martin Ramsl
Abstract: The present disclosure relates to a method for a software development system, the software development system comprising a code repository storing source code. The method comprises: receiving at the code repository an additional code; receiving at one or more documentation repositories documentation information for documenting the source code; generating corpus-based semantic word embeddings for code and documentation words of the source code and the documentation information; using the word embeddings for mapping by the software development system the source code to corresponding documentation; storing the mapping of the source code to the corresponding documentation.
-
公开(公告)号:US20250156640A1
公开(公告)日:2025-05-15
申请号:US19028157
申请日:2025-01-17
Applicant: SAP SE
Inventor: Hans-Martin Ramsl
IPC: G06F40/295 , G06F40/253 , G06N3/042 , G06N3/08 , G06N5/022
Abstract: Example methods and systems are directed to generating knowledge graph entities from text. Natural language text is received as input and processed using named entity recognition (NER), part of speech (POS) recognition, and business object recognition (BOR). The outputs of the NER, POS, and BOR processes are combined to generate knowledge entity triples comprising two entities and a relationship between them. Keywords are extracted from the text using NER to generate a set of entities. A node in a knowledge graph is created for at least some of the entities. A POS tagger identifies verbs in the text, generating a set of verbs. Relational verbs (e.g., “talk to” or “communicated with”) are detected and used to create edges in the knowledge graph. The knowledge graph may be converted back to natural language text using a trained machine learning model.
-
公开(公告)号:US20240013004A1
公开(公告)日:2024-01-11
申请号:US17860912
申请日:2022-07-08
Applicant: SAP SE
Inventor: Hans-Martin Ramsl
IPC: G06F40/30 , G06F16/2458 , G06F16/31
CPC classification number: G06F40/30 , G06F16/2458 , G06F16/31
Abstract: Example methods and systems are directed to automatic data card generation for datasets. A data card is a summary that describes quantitative aspects of a dataset, qualitative aspects of a dataset, or both. The data samples and documentation of a dataset are analyzed automatically to determine a number of samples, a primary data type, a license, or any suitable combination thereof. Data formats for data and documentation of the dataset may be automatically recognized. Language of text data may be automatically recognized. The most frequent language for the text data may be identified as the primary language of the dataset. A data card may be created for the dataset. The data card may indicate the number of samples, the data formats used in the data set, the language of text data in the dataset, or any suitable combination thereof.
-
公开(公告)号:US11797281B2
公开(公告)日:2023-10-24
申请号:US17395213
申请日:2021-08-05
Applicant: SAP SE
Inventor: Hans-Martin Ramsl
Abstract: A machine learning model is trained to translate source code from one or more programming languages into a common programming language. The machine learning model translates source code from the other languages into the common programming language. A language embedder generates a vector for each function in the source code, all of which is now in the common programming language. A user provides a text search query which is converted by a language embedder to a vector. Based on the vector of the text search query and the vectors for the source code, search results are generated and presented in a user interface. Additional machine learning models may be trained and used to measure function complexity, test coverage, documentation quantity and complexity, or any suitable combination thereof. These measures may be used to determine which search results to present, an order in which to present search results, or both.
-
公开(公告)号:US11783611B2
公开(公告)日:2023-10-10
申请号:US17009526
申请日:2020-09-01
Applicant: SAP SE
Inventor: Hans-Martin Ramsl
IPC: G06K9/00 , G06N20/00 , G06K9/34 , G06N3/04 , G06V30/414 , G06V30/148 , G06V30/10
CPC classification number: G06V30/414 , G06N3/04 , G06N20/00 , G06V30/153 , G06V30/10
Abstract: In an example embodiment, machine learning is used to intelligently compress documents to reduce the overall footprint of storing large amounts of files for an organization. Specifically, a document is split into parts, with each part representing a grouping of text or an image. Optical character recognition is performed to identify the text in images. Machine learning techniques are then applied to a part of a document in order to determine how relevant the document is for the organization. The parts that are deemed to be not relevant may then be reduced in size, either by omitting them completely or by summarizing them. This allows for the compression to be tailored specifically to the organization, resulting in the ability to compress or eliminate parts of documents that other organizations might have found relevant (and thus would not have been compressed or eliminated through traditional means).
-
-
-
-
-
-
-
-
-