I have implemented three types of adversarial attacks that can be used on a trained CNN model. To countermeasure these attacks, a defense algorithm is also implemented. The dataset is used is MNIST.
Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake. They’re like optical illusions for machines.
I have implemented three types of white box attacks:
- Fast Gradient Sign Method
- Iterative Fast Gradient Sign Method
- Momentum Iterative Fast Gradient Sign Method
Below given is the training and validation loss accross all the epochs.
Test Accuracy after FGSM attack
Examples of some adversarial images:
Test Accuracy after I-FGSM attack
Examples of some adversarial images:
Test Accuracy after MI_FGSM attack
Examples of some adversarial images:
To countermeasure the above attacks, distillation was implemented.
Below given is the training and validation loss for netowrkf and networkf1
Below is the Test accuracy after defending the FGSM attack
Examples of the predicitions after defense:
Below is the Test accuracy after defending the I-FGSM attack
Examples of the predicitions after defense:
Below is the Test accuracy after defending the MI-FGSM attack
Examples of the predicitions after defense: