The aim of this project was to explore Adversarial Attacks and Defenses in Single as well as Multi-Agent Reinforcement Learning. In the Single-Agent domains, we focus on Pixel-Based attacks in Atari games from the Gym environments. In Multi-Agent, we concentrate on attacking by training Adversarial Policies in 1-vs-1 zero-sum continuous control robotic environments from the MuJoCo simulator. We also studied potential defense procedures to counter such attacks.
A detailed article about the methods and approaches studied during the project can be found here. We have also implemented some of these in this repository.
We also have a blog with articles on the several concepts involved in the project.
-
LearningPhaseAssignments
contains the Reinforcement Learning algorithms implemented during the learning phase of the project. This includes:- Tabular SARSA & Q-Learning
- Deep Q-Networks (DQN)
- Vanilla Policy Gradients (VPG/REINFORCE)
-
Adversarial-policies
contains a Tensorflow implementation of the attack by training Adversarial policies.The implementation in this folder is structured as follows:
agent-zoo
: Contains the pre-trained agent parameters for the environments described in Bansal et al., 2018a. Sourceabstraction.py
: A wrapper over the Multi-Agent environment (Two player Markov game) to use it as Single-Agent. It embeds the victim into the environment, with the adversarial agent taking actions, and receiving observations and reward signals.policy.py
: Contains the implementation of MLP and LSTM network policies of the agents.train.py
: Contains the code for training the adversarial policy using Proximal Policy Optmization (PPO).show.py
: Contains the testing and video-making part.finallog.txt
: Output logs from the training procedure.knd_results.txt
: Attack accuracy (win percentage of the adversary) in the Kick-and-Defend environment.knd3.zip
: Trained parameters for the adversarial policy in Kick-and-Defend.videos
: Video displaying the adversarial attack in Kick-and-Defend.
-
FGSM-on-Images
contains a PyTorch implementation of Pixel-based attacks on images and output plots and images with varying perturbations.fast_gradient_sign_method.py
: Contains the implementation of the Fast Gradient Sign Method (FGSM) on the MNIST dataset.
-
Adversarial-attacks-on-DNN-policies
: Contains a PyTorch implementation of the FGSM attack on Neural Network policies in Atari Pong environment.Adversarial-Attack
: Contains the code, stats, and videos for L1, L2, Linf norm Adversarial attacks on Pong agents inWhiteBox
as well asBlackBox
conditions.Test
: Code, stats, and videos for the Pong agent before the adversarial attack.Train
: Code and videos for training a Pong agent using PPO.policy-zoo
: Pre-trained policies used for the attacks.ppo2_pong.zip
: Trained parameters for the Pong agent trained using PPO.
- PyTorch
- Tensorflow (version 1.x)
- Stable-Baselines (version 2.10.1a1)
- MuJoCo 131
- Madhuparna Bhowmik
- Akash Nair
- Saurabh Agarwala
- Videh Raj Nema
- Kinshuk Kashyap
- Manav Singhal
Mentor: Moksh Jain