Efficient search for combinations of matching entities given constraints

    公开(公告)号:US11687575B1

    公开(公告)日:2023-06-27

    申请号:US17647477

    申请日:2022-01-10

    申请人: SAP SE

    IPC分类号: G06F16/33 G06F16/35 G06F16/31

    摘要: Methods, systems, and computer-readable storage media for receiving a set of inference results generated by a ML model, the inference results including a set of query entities and a set of target entities, each query entity having one or more target entities matched thereto by the ML model, processing the set of inference results to generate a set of matched sub-sets of target entities by executing a search over target entities in the set of target entities based on constraints, for each problem in a set of problems, providing the problem as a tuple including an index value representative of a target entity in the set of target entities and a value associated with the query entity, the value including a constraint relative to the query entity, and executing at least one task in response to one or more matched sub-sets in the set of matched sub-sets.

    EFFICIENT SEARCH FOR COMBINATIONS OF MATCHING ENTITIES GIVEN CONSTRAINTS

    公开(公告)号:US20230222147A1

    公开(公告)日:2023-07-13

    申请号:US17647477

    申请日:2022-01-10

    申请人: SAP SE

    IPC分类号: G06F16/33 G06F16/31 G06F16/35

    摘要: Methods, systems, and computer-readable storage media for receiving a set of inference results generated by a ML model, the inference results including a set of query entities and a set of target entities, each query entity having one or more target entities matched thereto by the ML model, processing the set of inference results to generate a set of matched sub-sets of target entities by executing a search over target entities in the set of target entities based on constraints, for each problem in a set of problems, providing the problem as a tuple including an index value representative of a target entity in the set of target entities and a value associated with the query entity, the value including a constraint relative to the query entity, and executing at least one task in response to one or more matched sub-sets in the set of matched sub-sets.

    ENHANCED MODEL EXPLANATIONS USING DYNAMIC TOKENIZATION FOR ENTITY MATCHING MODELS

    公开(公告)号:US20240177053A1

    公开(公告)日:2024-05-30

    申请号:US18070598

    申请日:2022-11-29

    申请人: SAP SE

    IPC分类号: G06N20/00

    CPC分类号: G06N20/00

    摘要: Methods, systems, and computer-readable storage media for receiving query data representative of query entities and target data representative of target entities, determining, by an attention ML model, a set of character-level embeddings, providing, by a sub-word-level tokenizer, a set of sub-word-level tokens, each sub-word-level token including a string of multiple characters, generating, by the attention ML model, a set of sub-word-level embeddings based on the set of sub-word-level tokens, providing, by the attention ML model, at least one attention matrix including attention scores, each attention score representative of a relative importance of a respective sub-word-level token in a predicted match, the predicted match including a match between a query entity and a target entity, and outputting an explanation based on the at least one attention matrix.

    DYNAMIC CALIBRATION OF CONFIDENCE-ACCURACY MAPPINGS IN ENTITY MATCHING MODELS

    公开(公告)号:US20230214456A1

    公开(公告)日:2023-07-06

    申请号:US17646886

    申请日:2022-01-04

    申请人: SAP SE

    IPC分类号: G06K9/62

    CPC分类号: G06K9/6265

    摘要: Methods, systems, and computer-readable storage media for receiving a first set of predictions generated by a ML model during execution of a training pipeline to train the ML model, each prediction in the first set of predictions being associated with a confidence, determining a set of confidence bins based on confidences of the first set of predictions, for each confidence bin in the set of confidence bins, providing an accuracy, processing the set of confidence bins and accuracies through a regression model to provide one or more regressions, each regression representing a confidence-to-accuracy relationship, defining a set of confidence thresholds based on at least one regression of the one or more regressions, and during an inference phase, applying the set of confidence thresholds to selectively filter predictions from a second set of predictions generated by the ML model.

    INCREMENTAL TRAINING FOR REAL-TIME MODEL PREFORMANCE ENHANCEMENT

    公开(公告)号:US20230128485A1

    公开(公告)日:2023-04-27

    申请号:US17452441

    申请日:2021-10-27

    申请人: SAP SE

    IPC分类号: G06N20/00

    摘要: Methods, systems, and computer-readable storage media for receiving IRF data sets, the IRF data sets including a set of records including inference results determined by the ML model during production use of the ML model and at least one correction to an inference result, executing incremental training of the ML model to provide an updated ML model by selectively filtering one or more records of the set of records to adjust a negative sample to positive sample proportion of a sub-set of records based on a negative sample to positive sample proportion of initial training of the ML model, for each record in the sub-set of records, determining a weight, and during incremental training, applying the weight of a respective record being in a loss function in determining an accuracy of the ML model based on the respective record, and deploying the updated ML model for production use.