Securing machine learning models against adversarial samples through backdoor misclassification
Abstract:
A method for securing a genuine machine learning model against adversarial samples includes the steps of attaching a trigger to a sample to be classified and classifying the sample with the trigger attached using a backdoored model that has been backdoored using the trigger. In a further step, it is determined whether an output of the backdoored model is the same as a backdoor class of the backdoored model, and/or an outlier detection method is applied to logits compared to honest logits that were computed using a genuine sample. These steps are repeated using different triggers and backdoored models respectively associated therewith. It is compared a number of times that an output of the backdoored models is not the same as the respective backdoor class, and/or a difference determined by applying the outlier detection method, against one or more thresholds so as to determine whether the sample is adversarial.
Information query
Patent Agency Ranking
0/0