Explainable Artificial Intelligence: How to Evaluate Explanations of Deep Neural Network Predictions

Abstract. With a surging appetite to leverage deep learning models as means to enhance decisionmaking, new requirements for interpretability are set. Renewed research interest has thus been found within the machine learning community to develop explainability methods that can estimate the influence of a given input feature to the prediction made by a model. Current explainability methods of deep neural networks have nonetheless shown to be far from fault-proof and the question of how to properly evaluate explanations has largely remained unsolved.

In this thesis work, we have taken a deep look into how popular, state-of-the-art explainability methods are evaluated with a specific focus on one evaluation technique that has proved particularly challenging - the continuity test. The test captures the sensitivity of explanations by measuring how an explanation, in relation to a model prediction, changes when its input is perturbed. While it might sound like a reasonable expectation that the methods behave consistently in their input domain, we show in experiments on both toyand real-world data, on image classification tasks, that there is little to no empirical association between how explanations and networks respond to perturbed inputs. As such, we challenge the proposition that explanations and model outcomes can, and should be, compared.

As a second line of work, we also point out how and why it is problematic that commonly applied ad-hoc perturbation techniques tend to produce samples that lie far from the data distribution. In the pursuit for better, more plausible image perturbations for the continuity test, we therefore present an alternative approach that relies on sampling in latent space, as learned by a probabilistic, generative model. To this end, we hope that the work presented in this thesis will not only be helpful in identifying limitations of current evaluation techniques, but that the work also contributes with ideas of how to improve them.