-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparing attack effectiveness is done incorrectly #6
Comments
Comparing different attacks against one model and determining how powerful the attack is evaluated and discussed in Section IV.A where all attacks attack the same model based on the same natural examples. And obviously, LLC is not the stronger attack than ILLC according to the results from Table III. However, Table V shows the classification accuracy of defense-enhanced models against those adversarial examples that have misclassified by the raw model. That is, 100% minus the classification accuracy does not represent the success rate of attacks. Therefore, the numbers in Table V should be interpreted as the effectiveness of defenses (classification accuracy) against successful adversarial examples. |
Right, that's the correct way to interpret these numbers. My concern is that we are going to see someone say "LLC was found to be a stronger attack against defended models than ILLC [cite to this paper]". There's a nice saying that you shouldn't write just to be understood, but write so you can't be misunderstood. The current presentation of the paper encourages this type of misunderstanding. I do agree the data is there to see that LLC is weaker than ILLC on a baseline model, but because this is the only figure that tries to see how well LLC/ILLC works against defended models (the other figure just shows how well LLC/ILLC works on an undefended model), people will take it as such. |
Using the data provided, it is not possible to compare the efficacy of different attacks across models. Imagine we would like to decide whether LLC or ILLC was the stronger attack on the CIFAR-10 dataset.
Superficially, I might look at the “Average” column and see that the average model accuracy under LLC is 39.4% compared to 58.7% accuracy under ILLC. While in general averages in security can be misleading, fortunately, for all models except one, LLC reduces the model accuracy more than ILLC does, often by over twenty percentage points.
A reasonable reader might therefore conclude (incorrectly!) that LLC is the stronger attack. Why is this conclusion incorrect? The LLC attack only succeeded 134 times out of 1000 times on the baseline CIFAR-10 model. Therefore, when the paper writes that the accuracy of PGD adversarial training under LLC is 61.2% what this number means is that 38.8% of adversarial examples that are effective on the baseline model are also effective on the adversarially trained model. How the model would perform on the other 866 examples is not reported. In contrast, when the base model is evaluated on the ILLC attack, the attack succeeded on all 1000 examples. The 83.7 accuracy obtained by adversarial training is inherently incomparable to the the 61.2% value.
The text was updated successfully, but these errors were encountered: