Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing attack effectiveness is done incorrectly #6

Open
carlini opened this issue Feb 26, 2019 · 2 comments
Open

Comparing attack effectiveness is done incorrectly #6

carlini opened this issue Feb 26, 2019 · 2 comments

Comments

@carlini
Copy link

carlini commented Feb 26, 2019

Using the data provided, it is not possible to compare the efficacy of different attacks across models. Imagine we would like to decide whether LLC or ILLC was the stronger attack on the CIFAR-10 dataset.

Superficially, I might look at the “Average” column and see that the average model accuracy under LLC is 39.4% compared to 58.7% accuracy under ILLC. While in general averages in security can be misleading, fortunately, for all models except one, LLC reduces the model accuracy more than ILLC does, often by over twenty percentage points.

A reasonable reader might therefore conclude (incorrectly!) that LLC is the stronger attack. Why is this conclusion incorrect? The LLC attack only succeeded 134 times out of 1000 times on the baseline CIFAR-10 model. Therefore, when the paper writes that the accuracy of PGD adversarial training under LLC is 61.2% what this number means is that 38.8% of adversarial examples that are effective on the baseline model are also effective on the adversarially trained model. How the model would perform on the other 866 examples is not reported. In contrast, when the base model is evaluated on the ILLC attack, the attack succeeded on all 1000 examples. The 83.7 accuracy obtained by adversarial training is inherently incomparable to the the 61.2% value.

@ryderling
Copy link
Owner

Using the data provided, it is not possible to compare the efficacy of different attacks across models. Imagine we would like to decide whether LLC or ILLC was the stronger attack on the CIFAR-10 dataset.

Superficially, I might look at the “Average” column and see that the average model accuracy under LLC is 39.4% compared to 58.7% accuracy under ILLC. While in general averages in security can be misleading, fortunately, for all models except one, LLC reduces the model accuracy more than ILLC does, often by over twenty percentage points.

A reasonable reader might therefore conclude (incorrectly!) that LLC is the stronger attack. Why is this conclusion incorrect? The LLC attack only succeeded 134 times out of 1000 times on the baseline CIFAR-10 model. Therefore, when the paper writes that the accuracy of PGD adversarial training under LLC is 61.2% what this number means is that 38.8% of adversarial examples that are effective on the baseline model are also effective on the adversarially trained model. How the model would perform on the other 866 examples is not reported. In contrast, when the base model is evaluated on the ILLC attack, the attack succeeded on all 1000 examples. The 83.7 accuracy obtained by adversarial training is inherently incomparable to the the 61.2% value.

Comparing different attacks against one model and determining how powerful the attack is evaluated and discussed in Section IV.A where all attacks attack the same model based on the same natural examples. And obviously, LLC is not the stronger attack than ILLC according to the results from Table III.

However, Table V shows the classification accuracy of defense-enhanced models against those adversarial examples that have misclassified by the raw model. That is, 100% minus the classification accuracy does not represent the success rate of attacks. Therefore, the numbers in Table V should be interpreted as the effectiveness of defenses (classification accuracy) against successful adversarial examples.

@carlini
Copy link
Author

carlini commented Mar 16, 2019

Right, that's the correct way to interpret these numbers. My concern is that we are going to see someone say "LLC was found to be a stronger attack against defended models than ILLC [cite to this paper]". There's a nice saying that you shouldn't write just to be understood, but write so you can't be misunderstood. The current presentation of the paper encourages this type of misunderstanding. I do agree the data is there to see that LLC is weaker than ILLC on a baseline model, but because this is the only figure that tries to see how well LLC/ILLC works against defended models (the other figure just shows how well LLC/ILLC works on an undefended model), people will take it as such.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants