REINFORCEMENT LEARNING AGENT TO MEASURE ROBUSTNESS OF BLACK-BOX IMAGE CLASSIFICATION MODELS
Abstract:
Systems and methods are provided for reinforcement Learning agents for adversarial black-box attacks to determine and refine robustness of a machine learning (ML) model. Examples include receiving an image corresponding to a ground truth and computing sensitivity of an ML model in classifying the image as the ground truth to added and removed distortions. An RL agent determines to add distortions to and remove distortions from the image based on the sensitivities. The ML Model classifies the image based on the added and removed distortions, and the process is repeated until the machine learning model misclassifies the image. Based on the misclassification, a measure of robustness is determined and/or the ML model can be retrained.
Information query
Patent Agency Ranking
0/0