Deep learning, and in particular, recurrent neural networks, has in recent years gained nonstop interest with its successful application in a broad range of areas. These include handwriting recognition, natural language processing, speed recognition and so on. However, with the ever expanding use of such models, their interpretability or the mechanism of their decision making process have been understudied. Such interpretation can not only help users trust the models and predictions more, but also provide valuable insights into various areas, such as genetic modeling and linguistics, and help with model designs.
Here, we organized papers and articles from difference sources to provide a somewhat full-around overview of developments in this area.
- Towards A Rigorous Science of Interpretable Machine Learning (arXiv, 2017)
keywords: definition of interpretability, incompleteness, taxonomy of evaluation, latent dimensions
- “Why Should I Trust You?” Explaining the Predictions of Any Classifier (SIGKDD, 2016)
keywords: model agnostic, text and image, local approximation, LIME - A Unified Approach to Interpreting Model Predictions (NIPS, 2017)
keywords: model agnostic, data agnostic, kernel SHAP, linear regression - Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation (EMNLP, 2014)
keywords: RNN encoder-decoder, natural language, novel hidden unit - Explainable Artificial Intelligence (XAI) on Time Series Data: A Survey (arXiv, 2021)
keywords: CNN, RNN, explainable AI methods, time series, natural language, backpropagation-based methods, perturbation-based methods, attention, Symbolic Aggregate Approximation (SAX), Fuzzy Logic - Towards a Rigorous Evaluation of XAI Methods on Time Series (ICCVW, 2019)
keywords: model-agnostic, time series, perturbation and sequence based evaluation, SHAP, DeepLIFT, LRP, Saliency Map, LIME - Visualizing and understanding recurrent networks (ICLR, 2016)
keywords: LSTM, natural language, revealed cells that identify interpretable and high-level patterns, long-range dependency, error analysis - Techniques for Interpretable Machine Learning (CACM, 2019)
keywords: overview of different models and interpretation techniques - On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation (PloS one, 2015)
keywords: nonlinear classifiers, image, Layer-Wise Relevance Propagation - Benchmarking Deep Learning Interpretability in Time Series Predictions (NIPS, 2020)
keywords: RNN, Temporal Convolutional Networks, Transformers, synthetic time series data, saliency-based interpretability methods, two-step temporal saliency rescaling (TSR)
- Interpreting recurrent neural networks on multivariate time series
keywords: RNN, multivariate time series, SHAP, instance importance, efficiency
- Attention is not Explanation (arXiv, 2019)
keywords: RNN, BiLSTM, binary text classification, question answering, feature importance, Kendall τ correlation, counterfactual attention weights, adversarial attention - Attention is not not Explanation (arXiv, 2019)
keywords: LSTM, binary text classification, uniform attention weights, model variance, MLP diagnostic tool, model-consistent adversarial training, TVD/JSD plots
- Interpreting a Recurrent Neural Network Predictions of ICU Mortality Risk (Journal of Biomedical Informatics, 2021)
keywords: LSTM, dt-patient-matrix, Learned Binary Masks (LBM), KernelSHAP - TimeSHAP: Explaining recurrent models through sequence perturbations (SIGKDD, 2021)
keywords: model-agnostic recurrent explainer based on KernelSHAP, feature/event/cell wise explanation, pruning method by grouping older events
- On Attribution of Recurrent Neural Network Predictions via Additive Decomposition (WWW, 2019)
keywords: LSTM, GRU, Bidirectional GRU, sentiment text, Stanford Sentiment Treebank 2 (SST2), Yelp Polarity (Yelp), decomposition, REAT - Visualizing and Understanding Neural Models in NLP (NAACL-HLT, 2016)
keywords: RNN, LSTM, Bidirectional LSTM, sentiment text, Stanford Sentiment Treebank, compositionality, unit salience
- Interpretation of Prediction Models Using the Input Gradient (arXiv, 2016)
keywords: model agnostic, Bag of Words, gradient
- Explaining Recurrent Neural Network Predictions in Sentiment Analysis (EMNLP, 2017)
keywords: LSTM, bidirectional LSTM, sentiment text, Stanford Sentiment Treebank, Layer-wise Relevance Propagation (LRP)
- Interpretability of time-series deep learning models: A study in cardiovascular patients admitted to Intensive care unit (Journal of Biomedical Informatics, 2021)
keywords: LSTM, EHRs data-stream, attention, activation maps - Show Me What You’re Looking For: Visualizing Abstracted Transformer Attention for Enhancing Their Local Interpretability on Time Series Data (FLAIRS, 2021)
keywords: Transformer, Synthetic Control Chart, ECG5000, attention, data abstraction, Symbolic Aggregate Approximation (SAX), according visualization - Focusing on What is Relevant: Time-Series Learning and Understanding using Attention (ICPR, 2018)
keywords: temporal contextual layer, time series, motion capture, key frame detection, action classification - Spatiotemporal Attention for Multivariate Time Series Prediction and Interpretation (ICASSP, 2021)
keywords: spatial interpretation, spatiotemporal attention mechanism - Uncertainty-Aware Attention for Reliable Interpretation and Prediction (NIPS, 2018)
keywords: RNN, risk prediction, attention, variational inference - Topological Attention for Time Series Forecasting (NIPS, 2021)
keywords: N-BEATS, univariate time series data, M4 dataset, topological attention - RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism (NIPS, 2016)
keywords: RNN, Electronic Health Records, reverse time order - Preserving Dynamic Attention for Long-Term Spatial-Temporal Prediction (KDD, 2020)
keywords: CNN, crowd flow prediction, service utilization prediction, Dynamic Switch-Attention Network (DSAN), Multi-Space Attention (MSA) - Attention based multi-modal new product sales time-series forecasting (KDD, 2020)
keywords: multi-modal encoder-decoder, sales, self-attention - Non-stationary Time-aware Kernelized Attention for Temporal Event Prediction (KDD, 2022)
keywords: Kernelized attention, Electricity Transformer Temperature, PM2.5, Generalized Spectral Mixture Kernel (GSMK)
- Benchmarking Deep Learning Interpretability in Time Series Predictions (NIPS, 2020)
keywords: RNN, Temporal Convolutional Networks, Transformers, synthetic time series data, saliency-based interpretability methods, two-step temporal saliency, rescaling (TSR) - Two Birds with One Stone: Series Saliency for Accurate and Interpretable Multivariate Time Series Forecasting (IJCAI, 2021)
keywords: model agnostic, time series, electricity, air quality, industry data, series saliency - Series Saliency: Temporal Interpretation for Multivariate Time Series Forecasting (arXiv, 2020)
keywords: model agnostic, series saliency, multivariate time series, temporal feature importance, heatmap visualization
- Understanding Neural Networks through Representation Erasure (arXiv, 2016)
keywords: Bi-LSTM, Uni-LSTM, RNN, natural language, lexical, sentiment, document, computing impact of erasure on evaluation metrics, reinforcement learning, erase minimum set of input words to flip a decision
- Interpretable and steerable sequence learning via prototypes (SIGKDD, 2019)
keywords: prototype sequence network, criteria for explainable prototypes, refining with user knowledge by creating/updating/deleting prototypes
- Electric Energy Consumption Prediction by Deep Learning with State Explainable Autoencoder (Energies, 2019)
keywords: LSTM, projector and predictor, energy consumption prediction, state transition, t-SNE algorithm - Explaining Deep Classification of Time-Series Data with Learned Prototypes (CEUR, 2019)
keywords: autoencoder-prototype, 2-D time series, ECG or respiration or speech waveforms, prototype diversity and robustness - Explainable Tensorized Neural Ordinary Differential Equations for Arbitrary-step Time Series Prediction (IEEE Transactions on Knowledge and Data Engineering, 2022)
keywords: ETN-ODE, tensorized GRU, multivariate time series, tandem attention, arbitrary-step prediction, multi-step prediction - TSXplain: Demystification of DNN Decisions for Time-Series using Natural Language and Statistical Features (ICANN, 2019)
keywords: model-agnostic, time series, textual explanation, statistical feature extraction, anomaly detection - Multilevel wavelet decomposition network for interpretable time series analysis (SIGKDD, 2018)
keywords: time series forecasting, multi-frequency LSTM, decomposition into small sub-series, importance score of middle layer - N-BEATS: Neural basis expansion analysis for interpretable time series forecasting (ICLR, 2020)
keywords: fully-connected layers with doubly residual stacking, interpretable architecture with trend or seasonality model - Exploring interpretable LSTM neural networks over multi-variable data (ICML, 2019)
keywords: interpretable multi-variable LSTM, mixture attention mechanism, training method to learn network parameter and variable/temporal importance
- Explainability and Adversarial Robustness for RNNs (BigDataService, 2020)
keywords: LSTM, network packet flows, adversarial robustness, feature sensitivity, Partial Dependence Plot (PDP), adversarial training - Adversarial Detection with Model Interpretation (SIGKDD, 2018)
keywords: model-agnostic, Twitter/YelpReview dataset, evasion-prone sample selection, local interpretation, defensive distillation, adversarial training
- Counterfactual Explanations for Machine Learning on Multivariate Time Series Data (ICAPAI, 2021)
keywords: model-agnostic, multivariate time series, HPC system telemetry datasets, heuristic algorithm, measuring good explanation - Instance-based Counterfactual Explanations for Time Series Classification (ICCBR, 2021)
keywords: model-agnostic, time series, UCR archive, properties of good counterfactuals, Native Guide method, w-counterfactual, NUN-CF
-
Don’t Get Me Wrong: How to apply Deep Visual Interpretations to Time Series (arXiv, 2022)
keywords: gradient- or perturbation-based post-hoc visual interpretation, sanity, faithfulness, sensitivity, robustness, stability, localization -
Evaluation of interpretability methods for multivariate time series forecasting(Applied Intelligence, 2021)
keywords: time series forcasting, Area Over the Perturbation Curve, Ablation Percentage Threshold, local fidelity, local explanation -
Validation of XAI explanations for multivariate time series classification in the maritime domain(Journal of Computational Science, 2022)
keywords: LIME, time-slice mapping, SHAP, Path Integrated Gradient, heatmap, perturbation, sequence analysis, noval evaluation technique