-
公开(公告)号:US20240176778A1
公开(公告)日:2024-05-30
申请号:US18059237
申请日:2022-11-28
Applicant: SAP SE
Inventor: Tetyana Chernenko , Benjamin Schork , Marcus Danei
IPC: G06F16/242 , G06F40/242 , G06F40/30
CPC classification number: G06F16/243 , G06F40/242 , G06F40/30
Abstract: Term ambiguity is resolved by referencing a terminology database. An input is received comprising the term designated as ambiguous, and a string including the term. The term is posed as a query to the terminology database containing metadata of at least one type. Query results are returned including at least two possible meanings. Sequence(s) are extracted from the query results, each sequence including at least two pieces of metadata of a same type—one for each possible meaning of the ambiguous term. The metadata of each entry of a sequence is compared with the query result and corresponding scores are calculated. The scores are compared to determine a final meaning of the ambiguous term. Simpler embodiments considering one type of metadata (one sequence), may calculate and compare a listing of scores. Complex embodiments considering more than one type of metadata (multiple sequences), may calculate and compare a matrix of scores.
-
公开(公告)号:US20240354511A1
公开(公告)日:2024-10-24
申请号:US18304640
申请日:2023-04-21
Applicant: SAP SE
Inventor: Tetyana Chernenko , Benjamin Schork , Marcus Danei
Abstract: Embodiments relate to systems and methods that improve the definition of semantic domains within incoming data, and accurately distribute data over those defined domains. In a particular embodiment, company-specific terminology and data governance (d.g.) domains are used to define “highly semantically loaded” terms within an incoming linguistic data corpus having existing semantic domains assigned thereto. Analyzing distribution patterns of such highly semantically loaded terms across the incoming linguistic data (and/or across the d.g. domains) enhances the accuracy of assignment of semantical domains and distribution of the data across these domains. Such improved semantic domains can improve operation of computers tasked with downstream processing of the linguistic data—e.g., by Natural Language Processing (NLP).
-
公开(公告)号:US12067370B2
公开(公告)日:2024-08-20
申请号:US17342114
申请日:2021-06-08
Applicant: SAP SE
Inventor: Tetyana Chernenko , Anton Snitko , Jens Scharnbacher , Michail Vasiltschenko
IPC: G06F40/30 , G06F16/2458 , G06F40/274 , G06F40/44 , G06F40/55 , G06F40/58
CPC classification number: G06F40/58 , G06F16/2468 , G06F40/274 , G06F40/55
Abstract: Translation capability for language processing determines an existence of an abbreviation, followed by non-exact matching to map the abbreviation to the original full term. A received string in a source language is provided as input to a translation service. Translation proposals in a different target language are received back. A ruleset (considering factors, e.g., camel case format, the presence of a concluding period, and/or consecutive consonants) is applied to generate abbreviation candidates from the translation proposals. Non-exact matching (referencing e.g., a comparison metric) may then be used to map the abbreviation candidates to text strings of their original full terms. A mapping of the abbreviation to the text string of the original full term is stored in a translation database comprising linguistic data. Embodiments leverage existing resources (e.g., translation service, non-exact matching) to reduce effort and expense of accurately identifying abbreviations and then mapping them to their full original terms.
-
公开(公告)号:US20220391601A1
公开(公告)日:2022-12-08
申请号:US17342114
申请日:2021-06-08
Applicant: SAP SE
Inventor: Tetyana Chernenko , Anton Snitko , Jens Scharnbacher , Michail Vasiltschenko
IPC: G06F40/58 , G06F40/55 , G06F40/274 , G06F16/2458
Abstract: Translation capability for language processing determines an existence of an abbreviation, followed by non-exact matching to map the abbreviation to the original full term. A received string in a source language is provided as input to a translation service. Translation proposals in a different target language are received back. A ruleset (considering factors, e.g., camel case format, the presence of a concluding period, and/or consecutive consonants) is applied to generate abbreviation candidates from the translation proposals. Non-exact matching (referencing e.g., a comparison metric) may then be used to map the abbreviation candidates to text strings of their original full terms. A mapping of the abbreviation to the text string of the original full term is stored in a translation database comprising linguistic data. Embodiments leverage existing resources (e.g., translation service, non-exact matching) to reduce effort and expense of accurately identifying abbreviations and then mapping them to their full original terms.
-
-
-