Method and system for meaningful counterfactual explanations

    公开(公告)号:US11961287B2

    公开(公告)日:2024-04-16

    申请号:US17449872

    申请日:2021-10-04

    摘要: A computer-implemented method for explaining an image classifier, the method comprising: receiving an initial image, the initial image having been wrongly classified by the image classifier; receiving an initial gradient of a function executed by the image classifier generated while classifying the initial image, the function being indicative of a probability for the initial image to belong to an initial class; converting the initial image into a latent vector, the latent vector being a representation of the initial image in a latent space; generating a plurality of perturbation vectors using the initial gradient of the function executed by the image classifier; combining the latent vector with each one of the plurality of perturbation vectors, thereby obtaining a plurality of modified vectors; for each one of the plurality of modified vectors, reconstructing a respective image, thereby obtaining a plurality of reconstructed images; transmitting the reconstructed images to the image classifier; for each one of the plurality of reconstructed images, receiving a respective updated gradient of the function executed by the image classifier; using the respective updated gradients, determining amongst the reconstructed images at least one given reconstructed image for which the respective updated gradient is indicative that a new class different from the initial class has been assigned by the image classifier; and outputting the at least one given reconstructed image.