ADAPTIVE TRAINING COMPLETION TIME AND STATUS FOR MACHINE LEARNING MODELS

    公开(公告)号:US20230229961A1

    公开(公告)日:2023-07-20

    申请号:US17646889

    申请日:2022-01-04

    申请人: SAP SE

    IPC分类号: G06N20/00

    CPC分类号: G06N20/00

    摘要: Methods, systems, and computer-readable storage media for providing a set of heuristics representative of training data that is to be used to process a ML model through a training pipeline, the training pipeline including multiple phases, determining a set of time estimates by providing the set of heuristics as input to a training heuristics model that provides the set of time estimates as output, each time estimate in the set of time estimates indicating an estimated duration of a respective phase of the training pipeline, receiving, during processing of the ML model through the training pipeline, progress data representative of a progress of processing of the ML model, determining a set of status estimates including a status estimate for each phase of the training pipeline based on the progress data, and transmitting the set of time estimates and the set of status estimates for display.

    GREEDY INFERENCE FOR RESOURCE-EFFICIENT MATCHING OF ENTITIES

    公开(公告)号:US20230153382A1

    公开(公告)日:2023-05-18

    申请号:US17455046

    申请日:2021-11-16

    申请人: SAP SE

    发明人: Sundeep Gullapudi

    IPC分类号: G06K9/62 G06N7/00

    摘要: Methods, systems, and computer-readable storage media for determining a set of potential probability thresholds based on a set of inference results provided by processing testing data through the ML model, for each potential probability threshold in the set of potential probability thresholds, determining an accuracy, selecting a probability threshold from the set of potential probability thresholds, processing an inference job including sets of entity pairs through the ML model to assign a label to each entity pair in the sets of entity pairs, each label being associated with a probability and including a type of multiple types, and for each entity pair having a label of one or more specified types, selectively removing an entity of the entity pair from further processing of the inference job by the ML model based on whether the probability associated with the label meets or exceeds the probability threshold.

    ENHANCED MODEL EXPLANATIONS USING DYNAMIC TOKENIZATION FOR ENTITY MATCHING MODELS

    公开(公告)号:US20240177053A1

    公开(公告)日:2024-05-30

    申请号:US18070598

    申请日:2022-11-29

    申请人: SAP SE

    IPC分类号: G06N20/00

    CPC分类号: G06N20/00

    摘要: Methods, systems, and computer-readable storage media for receiving query data representative of query entities and target data representative of target entities, determining, by an attention ML model, a set of character-level embeddings, providing, by a sub-word-level tokenizer, a set of sub-word-level tokens, each sub-word-level token including a string of multiple characters, generating, by the attention ML model, a set of sub-word-level embeddings based on the set of sub-word-level tokens, providing, by the attention ML model, at least one attention matrix including attention scores, each attention score representative of a relative importance of a respective sub-word-level token in a predicted match, the predicted match including a match between a query entity and a target entity, and outputting an explanation based on the at least one attention matrix.

    DYNAMIC CALIBRATION OF CONFIDENCE-ACCURACY MAPPINGS IN ENTITY MATCHING MODELS

    公开(公告)号:US20230214456A1

    公开(公告)日:2023-07-06

    申请号:US17646886

    申请日:2022-01-04

    申请人: SAP SE

    IPC分类号: G06K9/62

    CPC分类号: G06K9/6265

    摘要: Methods, systems, and computer-readable storage media for receiving a first set of predictions generated by a ML model during execution of a training pipeline to train the ML model, each prediction in the first set of predictions being associated with a confidence, determining a set of confidence bins based on confidences of the first set of predictions, for each confidence bin in the set of confidence bins, providing an accuracy, processing the set of confidence bins and accuracies through a regression model to provide one or more regressions, each regression representing a confidence-to-accuracy relationship, defining a set of confidence thresholds based on at least one regression of the one or more regressions, and during an inference phase, applying the set of confidence thresholds to selectively filter predictions from a second set of predictions generated by the ML model.