[ Back to the tutorial's main page ]

Reading List

This reading list provides a general overview, yet not exhaustive, of the recent work in the field of interpreting deep learning. The first few papers (tutorial/review) provide an entry point to the field. They discuss general methodological and practical challenges. The next papers are more specific, and discuss the techniques of intepretation, how to evaluate them, and the application of these techniques to model validation and scientific problems.

Tutorial/review papers

Z.C. Lipton:
The Mythos of Model Interpretability. CoRR abs/1606.03490 (2016)
G. Montavon, W. Samek, K.-R. Müller:
Methods for Interpreting and Understanding Deep Neural Networks. Digital Signal Processing (2018)
A. Vellido, J.D. Martín-Guerrero, P. Lisboa:
Making machine learning models interpretable. ESANN 2012

Methods for DNN interpretability

Some recent work has focused on generating nice-looking interpretable prototypes that summarize the high-level concepts learned by a DNN.

K. Simonyan, A. Vedaldi, A. Zisserman:
Deep inside convolutional networks: Visualising image classification models and saliency maps. ICLR 2014
A. Mahendran, A. Vedaldi:
Visualizing Deep Convolutional Neural Networks Using Natural Pre-images. International Journal of Computer Vision 120(3): 233-255 (2016)
A. Nguyen, A. Dosovitskiy, J. Yosinski, T. Brox, J. Clune:
Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. NIPS 2016: 3387-3395
C. Olah, A. Mordvintsev, L. Schubert:
Feature Visualization. Distill, 2017

Other methods interpret individual classification decisions, e.g. in terms of input variables of the model. Some of these methods apply to any black-box classifier, while others assume a particular structure of the decision function:

M.T. Ribeiro, S. Singh, C. Guestrin:
"Why Should I Trust You?": Explaining the Predictions of Any Classifier. KDD 2016: 1135-1144
S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, W. Samek:
On Pixel-wise Explanations for Non-Linear Classifier Decisions by Layer-wise Relevance Propagation. PLOS ONE, 10(7): e0130140 (2015)
M. Zeiler, R. Fergus:
Visualizing and understanding convolutional networks. ECCV 2014: 818–833
G. Montavon, S. Lapuschkin, A. Binder, W. Samek, K.-R. Müller:
Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognition 65: 211-222 (2017)
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba:
Learning Deep Features for Discriminative Localization. CVPR 2016: 2921-2929
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, Y. Bengio:
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015: 2048-2057

Recent work also focuses on the interpretation and explanation of recurrent neural networks, especially in the NLP domain.

J. Li, X. Chen, E. Hovy, D. Jurafsky:
Visualizing and Understanding Neural Models in NLP. NAACL-HLT 2016: 681–691
L. Arras, G. Montavon, K.-R. Müller, W. Samek:
Explaining Recurrent Neural Network Predictions in Sentiment Analysis. EMNLP-WASSA 2017: 159-168
Y. Ding, Y. Liu, H. Luan, M. Sun:
Visualizing and Understanding Neural Machine Translation. ACL 2017: 1150-1159

Deep neural networks can be trained to transfer concepts from one domain to another (more interpretable) domain.

J. Donahue, L. Hendricks, M. Rohrbach, S. Venugopalan, S. Guadarrama, K. Saenko, T. Darrell:
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Trans. Pattern Anal. Mach. Intell. 39(4): 677-691 (2017)
S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee:
Generative Adversarial Text to Image Synthesis. ICML 2016: 1060-1069

Evaluating interpretability techniques

The interpretation techniques above produce interesting insights into the DNN model. But how should these techniques be compared and evaluated? This has become a crucial question, as more and more interpretation techniques are being proposed.

F. Doshi-Velez, B. Kim:
A Roadmap for a Rigorous Science of Interpretability. CoRR abs/1702.08608 (2017)
W. Samek, A. Binder, G. Montavon, S. Lapuschkin, K.-R. Müller
Evaluating the Visualization of What a Deep Neural Network Has Learned. IEEE Trans. Neural Netw. Learning Syst. 28(11): 2660-2673 (2017)

Model validation / understanding computer reasoning

If a DNN model produces high classification accuracy on test data, will it also work "outside the lab"? Will it behave in the same way as humans? Interpretability can provide an answer.

A. Das, H. Agrawal, L. Zitnick, D. Parikh, D. Batra:
Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions? EMNLP 2016: 932-937
S. Lapuschkin, A. Binder, G. Montavon, K.-R. Müller, W. Samek:
Analyzing Classifiers: Fisher Vectors and Deep Neural Networks. CVPR 2016: 2912-2920

Understanding how the model works is especially important in real-world applications, where an incorrect decision can be costly. Examples include, medical diagnosis, and self-driving cars.

R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, N. Elhadad:
Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission. KDD 2015: 1721-1730
M. Bojarski, P. Yeres, A. Choromanska, K. Choromanski, B. Firner, L. Jackel, U. Muller:
Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car. CoRR abs/1704.07911 (2017)

Interpretability for the Sciences

Deep ML models combined with interpretation techniques, provides a powerful tool for analyzing scientific data. It can point at previously unknown nonlinear relations in the data. This can be used to generate new scientific hypotheses, that can then be tested empirically.

D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, K.-R. Müller:
How to Explain Individual Classification Decisions. Journal of Machine Learning Research 11: 1803-1831 (2010)
P.M. Rasmussen, L.K. Hansen, K.H. Madsen, N.W. Churchill, S. Strother: Model sparsity and brain pattern interpretation of classification models in neuroimaging. Pattern Recognition 45(6): 2085-2100 (2012)
K.T. Schütt, F. Arbabzadah, S. Chmiela, K.-R. Müller, A. Tkatchenko:
Quantum-Chemical Insights from Deep Tensor Neural Networks Nat. Commun. 8, 13890, 2017.

Interpreting and Explaining Deep Models in Computer Vision

Tutorial/review papers

Methods for DNN interpretability

Evaluating interpretability techniques

Model validation / understanding computer reasoning

Interpretability for the Sciences