AUTOMATED DATASET REDUCTION BASED ON USE OF EXPLAINABILITY TECHNIQUES

    公开(公告)号:US20240256955A1

    公开(公告)日:2024-08-01

    申请号:US18163162

    申请日:2023-02-01

    CPC classification number: G06N20/00

    Abstract: Systems, methods, and apparatuses for automatically generating reduced training datasets are described. A training dataset may be inputted into a machine learning model to train the machine learning model to output a label. The machine learning model may comprise nodes, and each of the nodes may be associated with a weight. Based on datapoints, changes to the weight associated with each node of the plurality of nodes may be determined. Using model explainability techniques and based on the changes to the weight associated with each node of the plurality of nodes, pathways that decrease an accuracy of the machine learning model are identified. A first set of the datapoints that correlate with pathways that decrease the accuracy of the machine learning model outputting the label may be determined. Furthermore, the first set of the datapoints may be removed from the training dataset to generate a reduced training dataset.

Patent Agency Ranking