AUTOMATIC DETECTION OF CHANGES IN DATA SET RELATIONS

    公开(公告)号:US20230102152A1

    公开(公告)日:2023-03-30

    申请号:US17484104

    申请日:2021-09-24

    摘要: A system, program product, and method for automatic detection of data drift in a data set are presented. The method includes determining changes to relations in the data set through generating baseline and production data sets. The method further includes generating a production data set with some inserted data distortion, and defining, for a plurality of features in the baseline data set, potential relations for participant features. The method also includes determining a first likelihood and a second likelihood of each potential relation in the baseline and production data sets, respectively, for the participant features. The method further includes comparing each first likelihood with each second likelihood, generating a comparison value that is compared with a threshold value, and determining, subject to the comparison value exceeding the threshold value, the potential relation in the baseline data set does not describe a relation in the production data set.

    Labeling a dataset
    3.
    发明授权

    公开(公告)号:US11710068B2

    公开(公告)日:2023-07-25

    申请号:US16693303

    申请日:2019-11-24

    CPC分类号: G06N20/00 G06F7/523 G06N5/04

    摘要: A method, system and computer program product, the method comprising: obtaining a first model trained upon cases and labels, the first model providing a prediction in response to an input case; obtaining a second model trained using the cases and indications whether a predictions of the first model are correct, the second model providing a correctness prediction for the first; determining a case for which the second model predicts that the first provides an incorrect prediction; further training the first model also on a first corpus including the case and a label, thereby improving performance of the first model; providing the case to the first model to obtain a first prediction; and further training the second model also on a second corpus including the case and a correctness label, the correctness label being “correct” if the first prediction is equal to the label, thereby improving performance of the second model.

    LABELING A DATASET
    4.
    发明申请

    公开(公告)号:US20210158205A1

    公开(公告)日:2021-05-27

    申请号:US16693303

    申请日:2019-11-24

    IPC分类号: G06N20/00 G06N5/04 G06F7/523

    摘要: A method, system and computer program product, the method comprising: obtaining a first model trained upon cases and labels, the first model providing a prediction in response to an input case; obtaining a second model trained using the cases and indications whether a predictions of the first model are correct, the second model providing a correctness prediction for the first; determining a case for which the second model predicts that the first provides an incorrect prediction; further training the first model also on a first corpus including the case and a label, thereby improving performance of the first model; providing the case to the first model to obtain a first prediction; and further training the second model also on a second corpus including the case and a correctness label, the correctness label being “correct” if the first prediction is equal to the label, thereby improving performance of the second model.