System and Method Performing Terminology Disambiguation

    公开(公告)号:US20240176778A1

    公开(公告)日:2024-05-30

    申请号:US18059237

    申请日:2022-11-28

    Applicant: SAP SE

    CPC classification number: G06F16/243 G06F40/242 G06F40/30

    Abstract: Term ambiguity is resolved by referencing a terminology database. An input is received comprising the term designated as ambiguous, and a string including the term. The term is posed as a query to the terminology database containing metadata of at least one type. Query results are returned including at least two possible meanings. Sequence(s) are extracted from the query results, each sequence including at least two pieces of metadata of a same type—one for each possible meaning of the ambiguous term. The metadata of each entry of a sequence is compared with the query result and corresponding scores are calculated. The scores are compared to determine a final meaning of the ambiguous term. Simpler embodiments considering one type of metadata (one sequence), may calculate and compare a listing of scores. Complex embodiments considering more than one type of metadata (multiple sequences), may calculate and compare a matrix of scores.

    Semantic Domain Assignment Referencing Governance Domains and Term Databases

    公开(公告)号:US20240354511A1

    公开(公告)日:2024-10-24

    申请号:US18304640

    申请日:2023-04-21

    Applicant: SAP SE

    CPC classification number: G06F40/30 G06F40/58

    Abstract: Embodiments relate to systems and methods that improve the definition of semantic domains within incoming data, and accurately distribute data over those defined domains. In a particular embodiment, company-specific terminology and data governance (d.g.) domains are used to define “highly semantically loaded” terms within an incoming linguistic data corpus having existing semantic domains assigned thereto. Analyzing distribution patterns of such highly semantically loaded terms across the incoming linguistic data (and/or across the d.g. domains) enhances the accuracy of assignment of semantical domains and distribution of the data across these domains. Such improved semantic domains can improve operation of computers tasked with downstream processing of the linguistic data—e.g., by Natural Language Processing (NLP).

    Detection of abbreviation and mapping to full original term

    公开(公告)号:US12067370B2

    公开(公告)日:2024-08-20

    申请号:US17342114

    申请日:2021-06-08

    Applicant: SAP SE

    CPC classification number: G06F40/58 G06F16/2468 G06F40/274 G06F40/55

    Abstract: Translation capability for language processing determines an existence of an abbreviation, followed by non-exact matching to map the abbreviation to the original full term. A received string in a source language is provided as input to a translation service. Translation proposals in a different target language are received back. A ruleset (considering factors, e.g., camel case format, the presence of a concluding period, and/or consecutive consonants) is applied to generate abbreviation candidates from the translation proposals. Non-exact matching (referencing e.g., a comparison metric) may then be used to map the abbreviation candidates to text strings of their original full terms. A mapping of the abbreviation to the text string of the original full term is stored in a translation database comprising linguistic data. Embodiments leverage existing resources (e.g., translation service, non-exact matching) to reduce effort and expense of accurately identifying abbreviations and then mapping them to their full original terms.

    DETECTION OF ABBREVIATION AND MAPPING TO FULL ORIGINAL TERM

    公开(公告)号:US20220391601A1

    公开(公告)日:2022-12-08

    申请号:US17342114

    申请日:2021-06-08

    Applicant: SAP SE

    Abstract: Translation capability for language processing determines an existence of an abbreviation, followed by non-exact matching to map the abbreviation to the original full term. A received string in a source language is provided as input to a translation service. Translation proposals in a different target language are received back. A ruleset (considering factors, e.g., camel case format, the presence of a concluding period, and/or consecutive consonants) is applied to generate abbreviation candidates from the translation proposals. Non-exact matching (referencing e.g., a comparison metric) may then be used to map the abbreviation candidates to text strings of their original full terms. A mapping of the abbreviation to the text string of the original full term is stored in a translation database comprising linguistic data. Embodiments leverage existing resources (e.g., translation service, non-exact matching) to reduce effort and expense of accurately identifying abbreviations and then mapping them to their full original terms.

Patent Agency Ranking