SYSTEM AND METHOD FOR LABEL ERROR DETECTION VIA CLUSTERING TRAINING LOSSES

    公开(公告)号:US20240419966A1

    公开(公告)日:2024-12-19

    申请号:US18743768

    申请日:2024-06-14

    Abstract: Systems and methods for tackling a significant problem in data analytics: inaccurate dataset labeling. Such inaccuracies can compromise machine learning model performance. To counter this, label error detection algorithm is provided that efficiently identifies and removes samples with corrupted labels. The provided framework (CTRL) detects label errors in two steps based on the observation that models learn clean and noisy labels in different ways. First, one trains a neural network using the noisy training dataset and obtains the loss curve for each sample. Then, one applies clustering algorithms to the training losses to group samples into two categories: cleanly-labeled and noisily-labeled. After label error detection, one removes samples with noisy labels and retrains the model.

Patent Agency Ranking