Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attack success rate decreases with distortion bound #9

Open
carlini opened this issue Feb 26, 2019 · 2 comments
Open

Attack success rate decreases with distortion bound #9

carlini opened this issue Feb 26, 2019 · 2 comments

Comments

@carlini
Copy link

carlini commented Feb 26, 2019

It is a basic observation that when given strictly more power, the adversary should never do worse. However, in Table VII the paper reports that MNIST adversarial examples with their l_infinity norm constrained to be less than 0.2 are harder to detect than when constrained to be within 0.5. The reason this table shows this effect is that FGSM, a single-step method, is used to generate these adversarial examples. The table should be re-generated with an optimization-based attack (that actually targets the defense; not a transfer attack) to give meaningful numbers.

@ryderling
Copy link
Owner

It is a basic observation that when given strictly more power, the adversary should never do worse. However, in Table VII the paper reports that MNIST adversarial examples with their l_infinity norm constrained to be less than 0.2 are harder to detect than when constrained to be within 0.5. The reason this table shows this effect is that FGSM, a single-step method, is used to generate these adversarial examples. The table should be re-generated with an optimization-based attack (that actually targets the defense; not a transfer attack) to give meaningful numbers.

Again, we made the observation that there is no clear relationship between the magnitude of perturbation of AEs and the detection AUC in non-adaptive scenarios, where detection does not need to know which type of the attack belongs to (gradient-based attacks or optimization-based attacks). From the experiments with FGSM, we at least demonstrated that there is no always strictly positive correlation between them.

@carlini
Copy link
Author

carlini commented Mar 16, 2019

I agree that's the observation you make. But the way you evaluate it is flawed. I'm not going to repeat my argument again, but instead refer you to Section 5.2 of https://arxiv.org/abs/1902.06705.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants