This repo is an attempt to catalog and keep track of publications in the field of Adversarial Machine Learning. This includes Adversarial Attacks, Defences, Robustness Verification and Analysis. Feel free to open a PR if you feel there are papers that I have missed, or if you'd like to add papers from another conference not in this list.
Attacks
- Subspace Attack: Exploiting Promising Subspaces for Query-Efficient Black-box Attacks
- Functional Adversarial Attacks
- Cross-Modal Learning with Adversarial Samples
- Improving Black-box Adversarial Attacks with a Transfer-based Prior
- Adversarial Music: Real world Audio Adversary against Wake-word Detection System
- Cross-Domain Transferability of Adversarial Perturbations
- Fooling Neural Network Interpretations via Adversarial Model Manipulation
Defences
- Metric Learning for Adversarial Robustness
- Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training
- Adversarial training for free!
- On Single Source Robustness in Deep Fusion Models
- Certified Adversarial Robustness with Additive Noise
- Certifiable Robustness to Graph Perturbations
- Unlabeled Data Improves Adversarial Robustness
- Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers
- Provably robust boosted decision stumps and trees against adversarial attacks
- Adversarial Robustness through Local Linearization
- Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty
- A New Defense Against Adversarial Images: Turning a Weakness into a Strength
Verification
- Tight Certificates of Adversarial Robustness for Randomly Smoothed Classifiers
- A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks
- Robustness Verification of Tree-based Models
- Accurate, reliable and fast robustness evaluation
- Provable Certificates for Adversarial Examples: Fitting a Ball in the Union of Polytopes
Analysis
- Adversarial Examples Are Not Bugs, They Are Features
- Image Synthesis with a Single (Robust) Classifier
- Model Compression with Adversarial Robustness: A Unified Optimization Framework
- Robustness to Adversarial Perturbations in Learning from Incomplete Data
- Adversarial Training and Robustness for Multiple Perturbations
- On the Hardness of Robust Classification
- Theoretical evidence for adversarial robustness through randomization
- Are Labels Required for Improving Adversarial Robustness?
- Theoretical Analysis of Adversarial Learning: A Minimax Approach
- Convergence of Adversarial Training in Overparametrized Neural Networks
- A Fourier Perspective on Model Robustness in Computer Vision
- On Robustness to Adversarial Examples and Polynomial Optimization
- On Relating Explanations and Adversarial Examples
Attacks
- Adversarial Attacks on Node Embeddings via Graph Poisoning
- Adversarial camera stickers: A physical camera-based attack on deep learning systems
- NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks
- Wasserstein Adversarial Examples via Projected Sinkhorn Iterations
- Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition
- Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization
- Simple Black-box Adversarial Attacks
Defences
- Improving Adversarial Robustness via Promoting Ensemble Diversity
- Robust Decision Trees Against Adversarial Examples
- The Odds are Odd: A Statistical Test for Detecting Adversarial Examples
- Using Pre-Training Can Improve Model Robustness and Uncertainty
- ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation
- Certified Adversarial Robustness via Randomized Smoothing
Verification
- On Certifying Non-uniform Bounds against Adversarial Attacks
- PROVEN: Verifying Robustness of Neural Networks with a Probabilistic Approach
Analysis
- First-order Adversarial Vulnerability of Neural Networks and Input Dimension
- On the Convergence and Robustness of Adversarial Training
- On the Connection Between Adversarial Robustness and Saliency Map Interpretability
- Adversarial examples from computational constraints
- Limitations of Adversarial Robustness: Strong No Free Lunch Theorem
- Rademacher Complexity for Adversarially Robust Generalization
- POPQORN: Quantifying Robustness of Recurrent Neural Networks
- Are Generative Classifiers More Robust to Adversarial Attacks?
- Theoretically Principled Trade-off between Robustness and Accuracy
- Adversarial Examples Are a Natural Consequence of Test Error in Noise
- Exploring the Landscape of Spatial Robustness
- Interpreting Adversarially Trained Convolutional Neural Networks
Attacks
- Adversarial Attacks on Graph Neural Networks via Meta Learning
- Prior Convictions: Black-box Adversarial Attacks with Bandits and Priors
- Beyond Pixel Norm-Balls: Parametric Adversaries using an Analytically Differentiable Renderer
- ADef: an Iterative Algorithm to Construct Adversarial Deformations
- Structured Adversarial Attack: Towards General Implementation and Better Interpretability
- The Limitations of Adversarial Training and the Blind-Spot Attack
- CAMOU: Learning Physical Vehicle Camouflages to Adversarially Attack Detectors in the Wild
Defences
- Cost-Sensitive Robustness against Adversarial Examples
- Generalizable Adversarial Training via Spectral Normalization
- Towards the first adversarially robust neural network model on MNIST
- PeerNets: Exploiting Peer Wisdom Against Adversarial Attacks
- Characterizing Audio Adversarial Examples Using Temporal Dependency
- Improving the Generalization of Adversarial Training with Domain Adaptation
- Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network
- Adversarial Reprogramming of Neural Networks
- Defensive Quantization: When Efficiency Meets Robustness
Verification
- Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures
- Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability
- Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
- Evaluating Robustness of Neural Networks with Mixed Integer Programming
- A Statistical Approach to Assessing Neural Network Robustness
- Robustness Certification with Refinement
Analysis
- Excessive Invariance Causes Adversarial Vulnerability
- On the Sensitivity of Adversarial Robustness to Input Data Distributions
- Robustness May Be at Odds with Accuracy
- Are adversarial examples inevitable?
Attacks
- Adversarial Examples that Fool both Computer Vision and Time-Limited Humans
- Adversarial Attacks on Stochastic Bandits
- Constructing Unrestricted Adversarial Examples with Generative Models
Defences
- Deep Defense: Training DNNs with Improved Adversarial Robustness
- Scaling provable adversarial defenses
- Thwarting Adversarial Examples: An L_0-Robust Sparse Fourier Transform
- Bayesian Adversarial Learning
- Towards Robust Detection of Adversarial Examples
- Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples
- Robust Detection of Adversarial Attacks by Modeling the Intrinsic Properties of Deep Neural Networks
- A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks
Verification
Analysis
- Adversarially Robust Generalization Requires More Data
- A Spectral View of Adversarially Robust Features
- Adversarial vulnerability for any classifier
- Adversarial Risk and Robustness: General Definitions and Implications for the Uniform Distribution
Attacks
- Synthesizing Robust Adversarial Examples
- Adversarial Risk and the Dangers of Evaluating Against Weak Attacks
- Black-box Adversarial Attacks with Limited Queries and Information
- Adversarial Attack on Graph Structured Data
- Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples
- LaVAN: Localized and Visible Adversarial Noise
Defences
- Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope
- Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training
- Differentiable Abstract Interpretation for Provably Robust Neural Networks
Verification
Analysis
- Adversarial Regression with Multiple Learners
- Learning Adversarially Fair and Transferable Representations
- Analyzing the Robustness of Nearest Neighbors to Adversarial Examples
Attacks
- Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models
- Generating Natural Adversarial Examples
- Spatially Transformed Adversarial Examples
Defences
- Towards Deep Learning Models Resistant to Adversarial Attacks
- Countering Adversarial Images using Input Transformations
- PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples
- Stochastic Activation Pruning for Robust Adversarial Defense
- Thermometer Encoding: One Hot Way To Resist Adversarial Examples
- Certified Defenses against Adversarial Examples
- Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models
- Ensemble Adversarial Training: Attacks and Defenses
- Mitigating Adversarial Effects Through Randomization
- Certifying Some Distributional Robustness with Principled Adversarial Training
- Cascade Adversarial Machine Learning Regularized with a Unified Embedding
Analysis
- Decision Boundary Analysis of Adversarial Examples
- Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality
- Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples
- Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation
- Lower bounds on the robustness to adversarial perturbations
Attacks
Defences
- Adversarial Machine Learning at Scale
- DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples
- Adversarial Training Methods for Semi-Supervised Text Classification
- Early Methods for Detecting Adversarial Images
- Robustness to Adversarial Examples through an Ensemble of Specialists
Analysis
- Robustness of classifiers: from adversarial to random noise
- Measuring Neural Net Robustness with Constraints
- Distributional Smoothing with Virtual Adversarial Training
- Adversarial Manipulation of Deep Representations